Visual program specification and compilation of graph-based computation

ABSTRACT

Graph-based computation includes accepting specification information for the graph-based computation, the specification information including a plurality of graph elements, and providing a visual representation of the specification information to a user. A visual representation of a plurality of groups of the graph elements is determined based on the accepted specification information, including determining a spatial extent of a spatial region for at least a first group of the plurality of groups based at least in part on a spatial extent of each of a plurality of graph elements. A visual representation of spatial regions for the plurality of groups is presented in conjunction with the visual representation of the specification information, the visual representation of each spatial region including visual representations of at least some of the graph elements in the group corresponding to that spatial region.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Application Ser. No.62/164,175, filed on May 20, 2015, U.S. Application Ser. No. 62/213,343,filed Sep. 2, 2015, and U.S. Application Ser. No. 62/253,422, filed Nov.10, 2015, each of which is incorporated herein by reference.

BACKGROUND

This description relates to an approach to visual program specificationand compilation of graph-based computation.

One approach to data flow computation makes use of a graph-basedrepresentation in which computational components corresponding to nodes(vertices) of a graph are coupled by data flows corresponding to links(directed edges) of the graph (called a “dataflow graph”). A downstreamcomponent connected to an upstream component by a data flow linkreceives an ordered stream of input data elements, and processes theinput data elements in the received order, optionally generating one ormore corresponding flows of output data elements. A system for executingsuch graph-based computations is described in prior U.S. Pat. No.5,966,072, titled “EXECUTING COMPUTATIONS EXPRESSED AS GRAPHS,”incorporated herein by reference. In an implementation related to theapproach described in that prior patent, each component is implementedas a process that is hosted on one of typically multiple computerservers. Each computer server may have multiple such component processesactive at any one time, and an operating system (e.g., Unix) schedulershares resources (e.g., processor time, and/or processor cores) amongthe components hosted on that server. In such an implementation, dataflows between components may be implemented using data communicationservices of the operating system and data network connecting the servers(e.g., named pipes, TCP/IP sessions, etc.). A subset of the componentsgenerally serve as sources and/or sinks of data from the overallcomputation, for example, to and/or from data files, database tables,and external data flows. After the component processes and data flowsare established, for example, by a coordinating process, data then flowsthrough the overall computation system implementing the computationexpressed as a graph generally governed by availability of input data ateach component and scheduling of computing resources for each of thecomponents. Parallelism can therefore be achieved at least by enablingdifferent components to be executed in parallel by different processes(hosted on the same or different server computers or processor cores),where different components executing in parallel on different pathsthrough a dataflow graph is referred to herein as component parallelism,and different components executing in parallel on different portion ofthe same path through a dataflow graph is referred to herein as pipelineparallelism.

Other forms of parallelism are also supported by such an approach. Forexample, an input data set may be partitioned, for example, according toa partition of values of a field in records of the data set, with eachpart being sent to a separate copy of a component that processes recordsof the data set. Such separate copies (or “instances”) of a componentmay be executed on separate server computers or separate processor coresof a server computer, thereby achieving what is referred to herein asdata parallelism. The results of the separate components may be mergedto again form a single data flow or data set. The number of computers orprocessor cores used to execute instances of the component would bedesignated by a developer at the time the dataflow graph is developed.

Various approaches may be used to improve efficiency of such anapproach. For example, each instance of a component does not necessarilyhave to be hosted in its own operating system process, for example,using one operating system process to implement multiple components(e.g., components forming a connected subgraph of a larger graph).

At least some implementations of the approach described above sufferfrom limitations in relation to the efficiency of execution of theresulting processes on the underlying computer servers. For example, thelimitations may be related to difficulty in reconfiguring a runninginstance of a graph to change a degree of data parallelism, to change toservers that host various components, and/or to balance load ondifferent computation resources. Existing graph-based computationsystems also suffer from slow startup times, often because too manyprocesses are initiated unnecessarily, wasting large amounts of memory.Generally, processes start at the start-up of graph execution, and endwhen graph execution completes.

Other systems for distributing computation have been used in which anoverall computation is divided into smaller parts, and the parts aredistributed from one master computer server to various other (e.g.,“slave”) computer servers, which each independently perform acomputation and which return their result to a master server. Some ofsuch approaches are referred to as “grid computing.” However, suchapproaches generally rely on the independence of each computation,without providing a mechanism for passing data between the computationparts, or scheduling and/or sequencing execution of the parts, exceptvia the master computer server that invokes those parts. Therefore suchapproaches do not provide a direct and efficient solution to hostingcomputation involving interactions between multiple components.

Another approach for distributed computation on a large dataset makesuse of a MapReduce framework, for example, as embodied in the ApacheHadoop® system. Generally, Hadoop has a distributed filesystem in whichparts for each named file are distributed. A user specifies acomputation in terms of two functions: a map function, which is executedon all the parts of the named inputs in a distributed manner, and areduce function that is executed on parts of the output of the mapfunction executions. The outputs of the map function executions arepartitioned and stored in intermediate parts again in the distributedfilesystem. The reduce function is then executed in a distributed mannerto process the intermediate parts, yielding the result of the overallcomputation. Although computations that can be expressed in a MapReduceframework, and whose inputs and outputs are amendable for storage withinthe filesystem of the map-reduce framework can be executed efficiently,many computations do not match this framework and/or are not easilyadapted to have all their inputs and outputs within the distributedfilesystem.

In general, there is a need to increase computational efficiency (e.g.,increase a number of records processed per unit of given computingresources) of a computation whose underlying specification is in termsof a graph, as compared to approaches described above, in whichcomponents (or parallel executing copies of components) are hosted ondifferent servers. Furthermore, it is desirable to be able to adapt tovarying computation resources and requirements. There is also a need toprovide a computation approach that permits adapting to variation in thecomputing resources that are available during execution of one or moregraph based computations, and/or to variations in the computation loador time variation of load of different components of such computations,for example, due to characteristics of the data being processed. Thereis also a need to provide a computation approach that is able toefficiently make use of computational resources with differentcharacteristics, for example, using servers that have different numbersof processors per server, different numbers of processor cores perprocessor, etc., and to support both homogeneous as well asheterogeneous environments efficiently. There is also a desire to makethe start-up of graph-based computations quick. One aspect of providingsuch efficiency and adaptability is providing appropriate separation andabstraction barriers between choices made by a developer at the time ofgraph creation (at design-time), actions taken by a compiler (atcompile-time), and actions taken by the runtime system (at runtime).

In some examples, the program is specified directly in a graph-basedform, for instance, in a visual programming environment in which nodesand directed links of the graph-based specification are representinggraphical objects, for example, with a directed link represented as aline or arrow and a node represented as a box or other regular shape(e.g., rectangle, circle, etc.) in a graphical user interface. Note thatthe term “visual” is used to refer use of visual representations of theprogram elements in the environment, while “graph-based” is used torefer to the use of node and links, which may be representedgraphically. In this document, it should be understood that “visualrepresentation” of a graph-based computation includes graphical objectsrepresenting the nodes and links of the graph representing thecomputation. But it should be understood that graph-based specificationdo not necessarily have to be represented visually, and visualprogramming environments do not necessarily use graphical objectsrepresenting nodes and/or links of a graph.

A number of visual programming environments for specification ofgraph-based programs are in use today. For example, the Co>OperatingSystem of Ab Initio Software Corporation, LabVIEW of NationalInstruments Corporation, and Simulink of The Mathworks, Inc. providevisual programming environments, which allow a programmer to specify agraph-based computation, which is the compiled or executed directly.

In some examples of visual specification of graph-based programs, thesyntax of the specification is relatively straightforward. For example,in a dataflow computation specification, the links may represent anordered transfer of data records, and the nodes may represent atransformation of one or more records on one or more links to produceone or more records on one or more links. For example, in a simple case,one record is accepted from each input link and one record is providedto each output link.

In some examples, the syntax may be more complicated. For example, theuser may explicitly indicate a grouping or nesting of nodes. Forexample, some current visual programming environments include loopsyntax where the processing that is to be repeated for each iteration ofa loop is enclosed in a box element. A compiler of the graph-basedprogram processes the program specification consistent with the explicitsyntax, for example, for a looping construct. In some examples, a userexplicitly identifies a group of elements, and a border is drawn aroundthe group, which is then treated as a subsystem. In another approach, agraph specification may have a subportion that is explicitly indicated(e.g., by a user-inserted box) as following a different syntax thanoutside that portion. In some examples of visual programmingenvironments for graph-based computation, a group of elements may beidentified to the user, for example, to show a program error, by callingattention to the group with color or by drawing a border around thegroup.

SUMMARY

In a general aspect, representation of groups of elements in a visualrepresentation of a graph-based computation provides value to a user ofthat visual representation. This value only increases with thecomplexity of the visual representation, and the complexity of thegroups, for example, resulting from positioning of the elements of agroup, determination of groups themselves, for example, based onanalysis of syntactic elements of the visual representation, or nestingof groups to multiple levels and/or having disjoint “peer” groups at anyone level. Prior approaches, are limited, for example: (a) to a singlegroup or level of group (i.e., without nesting), (b) to having a regularboundary (e.g., a rectangle) within which the elements of a group arerepresented in the visual representation and/or from which elements notin that group are excluded in the representation, (c) bounding elementsof the group to remain within a boundary as compared to adapting aboundary to movement of elements of the group in the visualrepresentation, and/or (d) requiring the user to specify the elements ofa group as compared to automated syntactic (or semantic) analysis of thevisual representation and/or the underlying graph-based computation. Inorder to support specification of complex graph-based computations in avisual representation (e.g., in a visual development environment) avisual environment provides feedback to a user to indicate theinterpretation of the graph-based computation that a system will use forprocessing data (e.g., after compilation, or directly). Removing some orall of the limitations of previous approaches provides a way for a userto visualize and specify potentially complex graph-based computationswith a first computer hosting the visual environment automatically,providing visual feedback of the computation being specified, andcausing a second computer (or multicomputer system) to perform thecomputation corresponding directly to the visual feedback, for example,by compiling the visual representation into a runnable form forexecution on the second computer (or multicomputer system). Advantagesinclude efficiency in specification of the computation, and moreimportantly, accuracy and reduction of errors in the computation that isperformed based on the intent of the user specifying or viewing thevisual representation.

In another aspect, in general, a syntax of a visual representation of agraph-based computation does not require explicit specification by theuser of a bounding construct such as a box or outline around a nestedgroup of elements of the graph. In some but not necessarily allexamples, the nesting or grouping of elements of the visualrepresentation is determined according to syntax (or in some cases thesemantics) of the visual representation.

There are a number of reasons why avoiding a requirement for the user toexplicitly specify a bounding construct is desirable. These reasonsinclude one or more of the following: ease of entry by avoiding a step,avoiding complexity of having to arrange elements in the visualrepresentation in a manner that permits or is consistent with thebounding construct, avoiding the need to adjust the bounding constructand the contained elements when they are rearranged or added to, andimproving the clarity of a visual representation by avoiding unnecessary“clutter” with bounding constructs.

On the other hand, there are reasons for a user to be aware of thegrouping and/or nesting of the elements that may be determined by thesyntax and/or semantics of the visual representation, or that may havebeen determined by other means. For example, such grouping and nestingmay be significant to the syntactic and/or semantic interpretation ofthe program specification. For example, an unintended grouping couldshow the user errors or undesired syntax that resulted in that grouping.Therefore, even though it may be desirable for the user to not have toexplicitly specify the grouping and/or nesting, it may nevertheless bedesirable to identify and visualize such grouping and nesting to theuser. Furthermore, to the extent that the interpretation of the programspecification is dependent on determined grouping and nesting of itselements, the identification of the grouping and nesting to the user canserve to increase the accuracy of the task of program specification byidentifying potential programming errors to the user. To this end,showing the same analysis of the program specification used to providevisual feedback to the user as is used in the compilation process thatultimately controls the execution of the program and the resulting dataprocessing according to the program may be important. Therefore, morethan merely being an aid to program entry, the visual feedback ofgrouping and/or nesting of elements in the visual representationprovides a technical advantage by avoiding processing of data in amanner not intended by the user providing the program specification.

In another aspect, in general, an approach to visually representing one,two, three or more levels of nested grouping of elements in a visualrepresentation of a graph-based computation includes determining, foreach group, spatial extent of a (connected or possibly disconnected)region of the visual representation containing the elements of thatgroup. The region is then visually indicated in the visualrepresentations, for example, using visual representation techniquesthat use: color (including grayscale) (e.g., a background color, colorof the elements in the group, etc.); a border of the region (e.g., aline along the outline of the region); shading (e.g., representing an“elevation” of each group in a three-dimensional perspective view, orusing different hues for different regions); or a combination of one ormore of these or other visual representation techniques.

In some examples, the visual representation of the regions may alsoinclude annotation of (e.g, marking an edge) or insertion of additionalelements (e.g, inserting an element on an existing edge) in the visualrepresentation at or near an intersection of an outline of a region andan element of the visual representation. For example, in a case in whichthe elements that are grouped are nodes (e.g., represented as boxes)that are linked by arrows, an outline of a region will in generalintersect a number of the arrows. The annotation or insertion ofadditional elements may include annotating or adding a visual element onthe arrow joining nodes in different (i.e., nested) regions.

In some examples, the determining of the spatial extent of regions, witheach region associated with a corresponding group of elements of thevisual representation, is performed in two steps. Firstcharacterizations of a candidate set of outlines are determined. Forexample, these first outlines are consistent with the nesting andgrouping, and do not violate required properties such asnon-intersection of the outlines of the blocks. Then, these firstcharacterizations are used to determine second characterizations ofadjusted outlines that match (e.g., optimize) criteria related to thevisual representation of the regions, while maintaining requiredproperties such as full inclusion of the visual representation of thecontained elements as well as non-intersection with other outlines. Thecriteria that are matched can include one or more of:

area of the region, for example, with preference being to have compactarea; convexity, for example, avoiding concave (e.g., curved inward)sections of the outlines that are not necessitated to void intersectionof different outlines; preference for a connected versus a disconnectedregion for any particular group; and spacing of outlines, for example,requiring a minimum spacing between outlines of different groups. Insome examples, the outlines characterized by the second characterizationare further adjusted, for example, adjusting curvature of the outlines(e.g., to “round” corners of the regions).

In some examples, forming the first characterization makes use of atessellation of the region of the visual representation, which in someexamples, is a Delaunay triangulation of the region. In some examples,each visual element of a group is represented as a polygon (or a line orconnected lines). Vertices of the tiles of the tessellation are located(or otherwise correspond) to points on the polygon, for example,vertices of the polygon. The first characterization of the outlines thencomprises intersections of the outlines and the boundaries of the tilesof the tessellation.

In some examples, each vertex of the tessellation is associated with alabel from the set of partially ordered labels. In the case of polygon(e.g., triangle) shaped tiles, the number of intersections on a side ofa tile depends on the labels of the ends of that side (i.e., of thelabels of the graph element polygons or lines). If partial ordering ofthe labels is represented in a tree, as discussed above, the number ofregion outline intersections on the side of the tile equals the minimumnumber of edges of the tree that are traversed between the labels. Insome examples, the first characterization spaces these intersections ina regular manner (e.g., with uniform spacing) along the side of thetile.

In some examples, the second characterization of the outlines uses thesame tessellation of the region, and corresponds to an adjustment of theintersection points of the region outlines and the sides of the tiles ofthe tessellation.

It should be understood that techniques are known for forming contourplots that are consistent with spatially distributed data points usingtessellation in which vertices of the tiles are at the locations of thedata points. However, although there may be an apparent similaritybetween drawing contours and determining outlines of nested regions theproblems and their solutions are different. A first difference is thatthe labels of the groups are, in general, partially ordered and notfully ordered (e.g., the labels cannot be mapped to real numbersrepresenting the same ordering information). For example, an elevationis a fully ordered scale. In the present approach, two groups may havelabels at a same “level” (e.g., a same depth in a tree representation ofthe partial ordering). If two visual elements, one from each group, areadjacent, a conventional contour drawing approach would have no need toinsert a contour between them, essentially because they are at the same“elevation” or “level.” On the other hand, in a number of embodiments ofthe present approach, there would in general be at least two contoursseparating the visual elements: one contour separating the first regionfrom the rest of an enclosing region in which it and the second regionis nested, and another contour separating the second region from therest of the enclosing region. A further difference is that in the caseof contour drawing, there is not in general a need to optimize the shapeof the contours, for example in a manner such as minimizing the regionoutlines as discussed above. Notwithstanding the observation thatconventional tessellation based contour drawing approaches are notdirectly applicable to the problem of determining region outlinescorresponding to groups of visual elements, after these outlines aredetermined, the result can to interpreted to correspond to a“topography” on which the visual elements may be represented.

In some examples, the matching of the criteria in forming the secondcharacterization of the outlines includes an optimization (i.e.,minimization) of total length of the outlines, subject to constraints.The constraints may include, for example, a required separation of theoutlines, or required (e.g., minimum) radius of curvature, of theoutlines.

In some examples, computation and display of the regions may be dynamic,such that as elements of the visual representation are moved (e.g., by auser manipulating the representation), the regions corresponding to thegroups are adjusted to match the updated locations of the elements.Similarly, if the group membership or the nesting relationship of groupschanges, the display of the regions may be adjusted.

As introduced above, there are various ways in which the grouping and/ornesting of the graphical elements may be determined. An automated way ofdetermining the grouping uses syntactic elements of the visualrepresentation. For example, junctures between links and nodes, whichmay be represented as arrows and boxes in the visual representation, mayinclude syntactic elements representing input or output “ports” of thenodes. Different types of ports may be represented using differentvisually distinct symbols. The visual representation may beautomatically processed according to the syntactically relevant featuresof the representation to identify the groups and nesting of the groups.Other syntactic devices may be used, for example, using different typesof connectors (e.g., arrows) between elements, or annotations on linkingelements.

In some examples, the syntactic elements identify a granularity ofrepresentation of processing of by the graph-based computation. Forexample, certain links, or junctures between links and nodes, which maybe represented as ports, represent transfer of entire collections ofrecords, which may be unordered, fully ordered, or partially ordered,where a node of the graph-based computation that has an output linkrepresenting a transfer of an entire collection is understood to have aspecification that produces such a collection in its execution.Similarly, a node of the graph-based computation that has an input linkrepresenting transfer of an entire collection of records is understoodto have a specification that consumes such a collection in itsexecution. On the other hand, certain lines, or junctions between linksand nodes, which may be represented as ports, represent transfer of atmost one record (i.e., one record, or no record at all) in the executionof the computation associated with the node, and processing of acollection of records? corresponds to repeated executions of thecomputation specified for the node. In some such examples, the groupingis such that a group of elements may represent processing of at most onerecord on each link entering the region and producing at most one recordleaving the region, with at most one execution of each of thecomputations associated with each of the nodes of the group. Such agroup may be referred to as a “scalar” group. Such a group may benesting within another group in which executions of computationsassociated with nodes in that group produce or consume entirecollections of records.

It should be understood that the graphical elements that are grouped arenot necessarily the nodes of the graph being visually represented. Forexample, links represented as arrows may also be grouped (i.e., thespatial extent of the visual representation of the arrows are fullyincluded in the determined regions for the groups to which they belong).In some examples, groups of elements that are grouped include bothvisual representations of nodes and visual representations of links ofthe graph representing the computation.

In some examples, the grouping are determined by associating eachelement of the visual representation (or equivalently with each elementof the graph representing the computation) with a label of from apartially ordered set of labels. In some examples, each label maycorrespond to a path of nodes in a tree representing the partialordering of the labels. For example, the entire visual representationmay correspond to the root node of the tree, and each child of the rootmay correspond to a different disjoint group. For any node representinga group of elements, each child of that node represents a differentdisjoint nested group (e.g., subgroup of one or more, possibly all,elements) of that group. In the visual environment in which the regionsand boundaries are defined consistent with the partial ordering (i.e.,nesting) of the group labels, any path in the visual representationbetween a first region, corresponding to a first node in the tree, and asecond region, corresponding to a second node in the tree, correspondsto a path within the tree from the first node to the second node of thetree such that each transition between nodes of the tree corresponds totraversal of a boundary of a region in the visual representation.

In an aspect, in general, a method for graph-based computation includesaccepting specification information for the graph-based computation, thespecification information including a plurality of graph elements, andproviding a visual representation of the specification information to auser. In a first computation system, a visual representation of aplurality of groups of the graph elements is determined based on theaccepted specification information, including determining a spatialextent of a spatial region for at least a first group of the pluralityof groups based at least in part on a spatial extent of each of aplurality of graph elements. A visual representation of spatial regionsfor the plurality of groups is presented in conjunction with the visualrepresentation of the specification information, the visualrepresentation of each spatial region including visual representationsof at least some of the graph elements in the group corresponding tothat spatial region.

In some embodiments, a visual representation of the spatial region forthe first group is contained within a visual representation of a spatialregion for a second group of the plurality of groups, according to anesting of the first group of graph elements within the second group ofgraph elements, where (1) the first group of graph elements is a subsetof fewer than all graph elements in the first group of graph elements,and (2) each graph element in the first group of graph elements isdirectly connected at least one other graph element in the first groupof graph elements within the graph-based computation.

In some examples, determining the visual representation of the pluralityof groups includes processing the accepted specification information toform the groups. Further, the method may include causing an execution ofgraph-based computation on a second computation system to be consistentwith the formed plurality of groups. For example, an executablerepresentation of the graph-based computation may be formed from thespecification information and the formed groups.

In some embodiments, the visual feedback of the regions and/orcorresponding outlines of groups in the visual environment is computedby a first computer, for example, on a continuous basis or on demandfrom the user, to provide feedback that is consistent with the otherinput and modification of the visual representation of the graph-basedcomputation by the user. This computation by the first computer includesautomated determination of the groups and their nesting in order toidentify the groups. Note that the inputs and modifications by the usermay keep the groups unchanged, or may change the membership of pluralityof groups, for example, through a change of syntax (e.g., by a change ofconnection of ports of computation elements). In either case, the visualfeedback of the regions of the groups may (verb missing) in the visualrepresentation. The same computation by the first computer to identifythe groups is used by the first computer to compile and/or execute thecomputation on a second computer (or multicomputer system) in a mannerthat is consistent with the visual representation computed from thatcomputation of the groups. Therefore, the visual representation (e.g.,presented on a computer display to a user) and the computation, whichperforms the represented computation, are linked and consistent.

The specification information for the graph-based computation mayinclude a specification of the plurality of graph elements, thespecification of each graph element including a location of a visualrepresentation of the graph element in a visual representation of thegraph-based computation.

Determining the visual representation of plurality of groups of thegraph elements may include: forming a first characterization of acandidate set of outlines enclosing the spatial regions for the groups;and determining a second characterization of a final set of outlinesenclosing the spatial regions for the groups from the firstcharacterization. For example, forming the first characterization mayinclude forming a tessellation of at least a part of the visualrepresentation surrounding the graph elements. Forming the firstcharacterization may include identifying intersections of edges of tilesof the tessellation and the set of outlines. Determining the secondcharacterization may include modifying the intersections, which mayinclude constraining the modified intersections according to separationlimits between outlines or between outlines and graph elements.Determining the second characterization may further include smoothingand outline formed by joining the intersections.

In some implementations, the graph elements form a partially orderedset, and forming the first characterization includes determining anumber of outlines separating pairs of graph elements according to thepartial ordering. Forming the first characterization may includedetermining intersections of lines between visual representations ofgraph elements and the set of outlines according to the number ofoutlines separating the graph elements.

Determining the second characterization of a final set of outlines mayinclude reducing a length of each of the candidate set of outlines toform the final set of outlines, where reducing the length is constrainedby separation limits between outlines or between outlines and visualrepresentations of graph elements.

At least some spatial region for a group of graph elements may include adisconnected spatial region.

In some implementations, each of the graph elements in the plurality ofgraph elements includes nodes in a graph that includes nodesinterconnected by links.

In some implementations, each of one or more the graph elements in theplurality of graph elements represents a computation step within thegraph-based computation.

In some implementations, the visual representation of each spatialregion includes visual representations of at least some of the graphelements in the group corresponding to that spatial region.

In some implementations, the spatial extent of the spatial region forthe first group is specified by an outline enclosing the spatial regionfor the first group.

Aspects can have one or more of the following advantages.

This linking of the visual representation and computation performedaddresses a technical problem of avoiding computation errors throughmisspecification of the computation. The automated feedback of thegrouping and nesting of elements through the computation and renderingof spatial regions in the visual representation (e.g., in a visualinteractive development environment) addresses a technical problem ofinput of a complex specification with the solution of inferringaggregated elements of the specification (e.g., the groups and theirnesting) and rendering them (e.g., in the form of regions) withoutrequiring explicit user input to specify the groups and/or to renderthem in the visual representation.

Other features and advantages of the invention will become apparent fromthe following description, and from the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of a task-based computation system.

FIG. 2A is an example of a data processing graph.

FIG. 2B is an example of outlines for the data processing graph of FIG.2A.

FIG. 2C is an example of a portion of a data processing graph withcontrol and data ports.

FIGS. 2D-2E are examples of data processing graphs with control and dataports.

FIG. 3A is a data processing graph including a number of scalar outputport to scalar input port connections.

FIG. 3B is a data processing graph including a number of collectionoutput port to collection input port connections.

FIG. 3C is a data processing graph including a collection output port toscalar input port connection and a scalar output port to collectioninput port connection.

FIG. 4A is scalar port to scalar port connection between two components.

FIG. 4B is a collection port to collection port connection between twocomponents.

FIG. 4C is a collection port to scalar port connection between twocomponents, including an execution set entry point.

FIG. 4D is a scalar port to collection port connection between twocomponents, including an execution set exit point.

FIG. 5 is a data processing graph with a stack based assignmentalgorithm applied.

FIG. 6 is a data processing graph with a global mapping based assignmentalgorithm applied.

FIG. 7 is a data processing graph with user defined execution sets.

FIG. 8A and FIG. 8B illustrate a “same set as” relationship in a dataprocessing graph.

FIG. 9 is a data processing graph with an entry point that replicatesdata elements.

FIGS. 10A-10C illustrate a user interface workflow.

FIG. 11A is a data processing graph with illegal execution sets.

FIG. 11B is a data processing graph with an illegal execution set loop.

FIGS. 12A-12B are diagrams of examples of data processing graphs andcorresponding control graphs.

FIGS. 13A-13B are state transition diagrams for an example executionstate machine.

FIG. 14 is a diagram of a set of processing engines.

FIG. 15 is a block diagram of a compiler/interpreter including agraphical user interface.

FIG. 16 is an example of a visual representation of a graph-basedcomputation showing a spatial region associated with an execution set.

FIG. 17A is a diagram illustrating a triangular tessellation of thevisual representation of FIG. 16.

FIG. 17B is a diagram illustrating an initial outline corresponding tothe tessellation shown in FIG. 17B.

FIG. 17C is a diagram illustrating an adjusted outline corresponding toFIG. 17B.

FIGS. 17D-E are diagrams illustrating a “bumper” limitation onboundaries.

FIGS. 18A-B are diagrams illustrating a triangular tessellation andcorresponding outlines, respectively, for a second example.

FIG. 19 is a diagram illustrating processing of overlapping blocks.

FIG. 20 is a diagram illustrating a triangular tessellation for a thirdexample, corresponding to the outlines of FIG. 2B.

DESCRIPTION

Referring to FIG. 1, a task-based computation system 100 uses ahigh-level program specification 110 to control computation and storageresources of a computing platform 150 to execute the computationspecified by the program specification 110. A compiler/interpreter 120receives the high-level program specification 110 and generates atask-based specification 130 that is in a form that can be executed by atask-based runtime interface/controller 140. The compiler/interpreter120 identifies one or more “execution sets” of one or more “components”that can be instantiated, individually or as a unit, as fine-grainedtasks to be applied to each of multiple data elements. Part of thecompilation or interpretation process involves identifying theseexecution sets and preparing the sets for execution, as described inmore detail below. It should be understood that the compiler/interpreter120 may use any of variety of algorithms that include steps such asparsing the high-level program specification 110, verifying syntax, typechecking data formats, generating any errors or warnings, and preparingthe task-based specification 130, and the compiler/interpreter 120 canmake use of a variety of techniques, for example, to optimize theefficiency of the computation performed on the computing platform 150. Atarget program specification generated by the compiler/interpreter 120can itself be in an intermediate form that is to be further processed(e.g., further compiled, interpreted, etc.) by another part of thesystem 100 to produce the task-based specification 130. The discussionbelow outlines one or more examples of such transformations but ofcourse other approaches to the transformations are possible as would beunderstood, for example, by one skilled in compiler design.

Generally, the computation platform 150 is made up of a number ofcomputing nodes 152 (e.g., individual server computers that provide bothdistributed computation resources and distributed storage resources)thereby enabling high degrees of parallelism. As discussed in furtherdetail below, the computation represented in the high-level programspecification 110 is executed on the computing platform 150 asrelatively fine-grain tasks, further enabling efficient parallelexecution of the specified computation.

1 Data Processing Graphs

In some embodiments, the high-level program specification 110 is a typeof graph-based program specification called a “data processing graph”that includes a set of “components”, each specifying a portion of anoverall data processing computation to be performed on data. Thecomponents are represented, for example, in a programming user interfaceand/or in a data representation of the computation, as nodes in a graph.Unlike some graph-based program specifications, such as the dataflowgraphs described in the Background above, the data processing graphs mayinclude links between the nodes that represent any of transfer of data,or transfer of control, or both. One way to indicate the characteristicsof the links is by providing different types of ports on the components.The links are directed links that are coupled from an output port of anupstream component to an input port of a downstream component. The portshave indicators that represent characteristics of how data elements arewritten and read from the links and/or how the components are controlledto process data.

These ports may have a number of different characteristics. Onecharacteristic of a port is its directionality as an input port oroutput port. The directed links represent data and/or control beingconveyed from an output port of an upstream component to an input portof a downstream component. A developer is permitted to link togetherports of different types. Some of the data processing characteristics ofthe data processing graph depend on how ports of different types arelinked together. For example, links between different types of ports canlead to nested subsets of components in different “execution sets” thatprovide a hierarchical form of parallelism, as described in more detailbelow. Certain data processing characteristics are implied by the typeof the port. The different types of ports that a component may haveinclude:

-   -   Collection input or output ports, meaning that an instance of        the component will read or write, respectively, all data        elements of a collection that will pass over the link coupled to        the port. For a pair of components with a single link between        their collection ports, the downstream component is generally        permitted to read data elements as they are being written by an        upstream component, enabling pipeline parallelism between        upstream and downstream components. The data elements can also        be reordered, which enables efficiency in parallelization, as        described in more detail below. In some graphical        representations, for example in programming graphical        interfaces, such collection ports are generally indicated by a        square port symbol at the component.    -   Scalar input or output ports, meaning that an instance of the        component will read or write, respectively, at most one data        element from or to a link coupled to the port. For a pair of        components with a single link between their scalar ports, serial        execution of the down stream component after the upstream        component has finished executing is enforced using transfer of        the single data element as a transfer of control. In some        graphical representations, for example in programming graphical        interfaces, such scalar ports are generally indicated by a        triangle port symbol at the component.    -   Control input or output ports, which are similar to scalar        inputs or outputs, but no data element is required to be sent,        and are used to communicate transfers of control between        components. For a pair of components with a link between their        control ports, serial execution of the down stream component        after the upstream component has finished executing is enforced        (even if those components also have a link between collection        ports). In some graphical representations, for example in        programming graphical interfaces, such control ports are        generally indicated by a circular port symbol at the component.

These different types of ports enable flexible design of data processinggraphs, allowing powerful combinations of data and control flow with theoverlapping properties of the port types. In particular, there are twotypes of ports, collection ports and scalar ports, that convey data insome form (called “data ports”); and there are two types of ports,scalar ports and control ports, that enforce serial execution (called“serial ports”). A data processing graph will generally have one or morecomponents that are “source components” without any connected input dataports and one or more components that are “sink components” without anyconnected output data ports. Some components will have both connectedinput and output data ports. In some embodiments, the graphs are notpermitted to have cycles, and therefore must be a directed acyclic graph(DAG). This feature can be used to take advantage of certaincharacteristics of DAGs, as described in more detail below.

The use of dedicated control ports on components of a data processinggraph also enable flexible control of different parts of a computationthat is not possible using certain other control flow techniques. Forexample, job control solutions that are able to apply dependencyconstraints between dataflow graphs don't provide the fine-grainedcontrol enabled by control ports that define dependency constraintsbetween components within a single dataflow graph. Also, dataflow graphsthat assign components to different phases that run sequentially don'tallow the flexibility of sequencing individual components. For example,nested control topologies that are not possible using simple phases canbe defined using the control ports and execution sets described herein.This greater flexibility can also potentially improve performance byallowing more components to run concurrently when possible.

By connecting different types of ports in different ways, a developer isable to specify different types of link configurations between ports ofcomponents of a data processing graph. One type of link configurationmay correspond to a particular type of port being connected to the sametype of port (e.g., a scalar-to-scalar link), and another type of linkconfiguration may correspond to a particular type of port beingconnected to a different type of port (e.g., a collection-to-scalarlink), for example. These different types of link configurations serveboth as a way for the developer to visually identify the intendedbehavior associated with a part of the data processing graph, and as away to indicate to the compiler/interpreter 120 a corresponding type ofcompilation process needed to enable that behavior. While the examplesdescribed herein use unique shapes for different types of ports tovisually represent different types of link configurations, otherimplementations of the system could distinguish the behaviors ofdifferent types of link configurations by providing different types oflinks and assigning each type of link a unique visual indicator (e.g.,thickness, line type, color, etc.). However, to represent the samevariety of link configurations possible with the three types of portslisted above using link type instead of port type, there would be morethan three types of links (e.g., scalar-to-scalar,collection-to-collection, control-to-control, collection-to-scalar,scalar-to-collection, scalar-to-control, etc.) Other examples couldinclude different types of ports, but without explicitly indicating theport type visually within a data processing graph.

The compiler/interpreter 120 performs procedures to prepare a dataprocessing graph for execution. A first procedure is an execution setdiscovery pre-processing procedure to identify a hierarchy ofpotentially nested execution sets of components. A second procedure is acontrol graph generation procedure to generate, for each execution set,a corresponding control graph that the compiler/interpreter 120 will useto form control code that will effectively implement a state machine atruntime for controlling execution of the components within eachexecution set. Each of these procedures will be described in greaterdetail below.

A component with at least one input data port specifies the processingto be performed on each input data element or collection (or tuple ofdata elements and/or collections on multiple of its input ports). Oneform of such a specification is as a procedure to be performed on one ora tuple of input data elements and/or collections. If the component hasat least one output data port, it can produce corresponding one or atuple of output data elements and/or collections. Such a procedure maybe specified in a high level statement-based language (e.g., using Javasource statements, or a Data Manipulation Language (DML) for instance asused in U.S. Pat. No. 8,069,129 “Editing and Compiling Business Rules”),or may be provided in some fully or partially compiled form (e.g., asJava bytecode). For example, a component may have a work procedure whosearguments include its input data elements and/or collections and itsoutput data elements and/or collections, or more generally, referencesto such data elements or collections or to procedures or data objects(referred to herein as “handles”) that are used to acquire input andprovide output data elements or collections.

Work procedures may be of various types. Without intending to limit thetypes of procedures that may be specified, one type of work procedurespecifies a discrete computation on data elements according to a recordformat. A single data element may be a record from a table (or othertype of dataset), and a collection of records may be all of the recordsin a table. For example, one type of work procedure for a component witha single scalar input port and a single scalar output port includesreceiving one input record, performing a computation on that record, andproviding one output record. Another type of work procedure may specifyhow a tuple of input records received from multiple scalar input portsare processed to form a tuple of output records sent out on multiplescalar output ports.

The semantic definition of the computation specified by the dataprocessing graph is inherently parallel in that it representsconstraints and/or lack of constraints on ordering and concurrency ofprocessing of the computation defined by the graph. Therefore, thedefinition of the computation does not require that the result isequivalent to some sequential ordering of the steps of the computation.On the other hand, the definition of the computation does providecertain constraints that require sequencing of parts of the computation,and restrictions of parallel execution of parts of the computation.

In the discussion of data processing graphs, implementation of instancesof components as separate “tasks” in a runtime system is assumed as ameans of representing sequencing and parallelization constraints. A morespecific discussion of an implementation of the data processing graphinto a task-based specification, which implements the computationconsistently with the semantic definition, is discussed more fully afterthe discussion of the characteristics of the graph-based specificationitself.

Generally, each component in a data processing graph will beinstantiated in the computing platform a number of times duringexecution of the graph. The number of instances of each component maydepend on which of multiple execution sets the component is assigned to.When multiple instances of a component are instantiated, more than oneinstance may execute in parallel, and different instances may execute indifferent computing nodes in the system. The interconnections of thecomponents, including the types of ports, determine the nature ofparallel processing that is permitted by a specified data processinggraph.

Although in general state is not maintained between executions ofdifferent instances of a component, as discussed below, certainprovisions are provided in the system for explicitly referencingpersistent storage that may span executions of multiple instances of acomponent.

In examples where a work procedure specifies how a single record isprocessed to produce a single output record, and the ports are indicatedto be collection ports, a single instance of the component may beexecuted, and the work procedure is iterated to process successiverecords to generate successive output records. In this situation, it ispossible that state is maintained within the component from iteration toiteration.

In examples where a work procedure specifies how a single record isprocessed to produce a single output record, and the ports are indicatedto be scalar ports, multiple instances of the component may be executed,and no state is maintained between executions of the work procedure fordifferent input records.

Also, in some embodiments, the system supports work procedures that donot follow a finest-grained specification introduced above. For example,a work procedure may internally implement an iteration, for example,which accepts a single record through a scalar port and providesmultiple output records through a collection port.

As noted above, there are two types of data ports, collection ports andscalar ports, that convey data in some form; and there are two types ofserial ports, scalar ports and control ports, that enforce serialexecution. In some cases, a port of one type can be connected by a linkto a port of another type. Some of those cases will be described below.In some cases, a port of one type will be linked to a port of the sametype. A link between two control ports (called a “control link”) imposesserial execution ordering between linked components, without requiringdata to be sent over the link. A link between two data ports (called a“data link”) provides data flow, and also enforces a serial executionordering constraint in the case of scalar ports, and does not requireserial execution ordering in case of collection ports. A typicalcomponent generally has at least two kinds of ports including input andoutput data ports (either collection ports or scalar ports) and inputand output control ports. Control links connect the control port of anupstream component to a control port of a downstream component.Similarly, data links connect the data port of an upstream component toa data port of a downstream component.

A graphical user interface can be used by developers to specify aspecific data processing computation from a set of components, each ofwhich carries out a particular task (e.g., a data processing task). Thedeveloper does so by assembling a data processing graph on a canvas areashown on a display screen. This involves placing the components on thecanvas, connecting their various ports with appropriate links, andotherwise configuring the components appropriately.

An example of such a data processing graph that may be constructed insuch a graphical user interface is shown in FIG. 2A. A series ofcomponents are shown with various types of ports interconnected bylinks. In some implementations, including the implementation illustratedin this example, the convention is that input data ports (bothcollection ports and scalar ports) are in the left side of a component,and output data ports (both collection ports and scalar ports) are onthe right side of a component. Also, in this example, input controlports are on the top side of a component, and output control ports areon the bottom side of a component. While these conventions, along withthe different port symbol shapes, help to convey to a developer the flowof data and control, it is still generally very difficult for adeveloper to ascertain at a glance how different components would begrouped into execution sets based on these conventions alone, even ifthe developer were very familiar with the known execution set discoveryprocedures (described in more detail below). So, the outlines describedherein can be computed and visually presented within the user interface,as shown in FIG. 2B, clearly showing each nested group of componentsthat belong in the same execution set. As is clearly evident by acomparison of FIGS. 2A and 2B, both the existence of the outlines, andthe features of their contours (e.g., being selected not to overlap witheach other), aid greatly in conveying functionality to a developer orother user.

The following simple example illustrates certain behavior in the contextof components that have a single pair of collection ports and a singlepair of control ports.

FIG. 2C shows an example in which a portion of a data processing graphbeing assembled includes a first component 210A with input and outputcontrol ports 212A, 214A, and input and output collection ports 216A,218A. Control links 220A, 222A connect the input and output controlports 212A, 214A to control ports of other components in the dataprocessing graph. Similarly, data links 224A, 226A connect the input andoutput collection ports 216A, 218A to ports of other components in thedata processing graph. The collection ports 216A, 218A are representedin the figure with rectangular shape, whereas the control ports 212A,214A are represented with circular shape.

In general, the input collection port 216A receives data to be processedby the component 210A, and the output collection port 214 provides datathat has been processed by the component 210A. In the case of acollection port, this data is generally an unordered collection of anunspecified number of data elements. In a particular instance of theoverall computation, the collection may include multiple data elements,or a single data element, or no data elements. In some implementations,a collection is associated with a parameter that determines whether theelements in the collection are unordered or ordered (and if ordered,what determines the ordering). As will be described in greater detailbelow, for an unordered collection, the order in which the data elementsare processed by the component at the receiving side of the data linkmay be different from the order in which the component at the sendingside of the data link provides those data elements. Thus, in the case ofcollection ports, the data link between them acts as a “bag” of dataelements from which a data element may be drawn in an arbitrary order,as opposed to a “conveyor belt” that moves data elements from onecomponent to another in a specific order.

The control links are used to convey control information between controlports, which determines whether and when a component will beginexecution. For example, the control link 222A either indicates that thecomponent 210B is to begin execution after the component 210A hascompleted (i.e., in a serial order), or indicates that the component210B is not to begin execution (i.e., is to be “suppressed”). Thus,while no data is sent over a control link, it can be viewed as sending asignal to the component on the receiving side. The way this signal issent may vary depending on the implementation, and in someimplementations may involve the sending of a control message betweencomponents. Other implementations may not involve sending an actualcontrol message, but may instead involve a process directly invoking aprocess or calling a function associated with the task represented bythe component on the receiving side (or omission of such invocation orfunction call in the case of suppression).

The ability to link control ports thus enables the developer to controlthe relative ordering among the different portions of a data processingcomputation represented by different components of the data processinggraph. Additionally, providing this ordering mechanism using controlports on the components enables the mixing of logic associated with dataflow and control flow. In effect, this enables data to be used to makedecisions about control.

In the example shown in FIG. 2C, control ports connect to other controlports, and data ports connect to other data ports. However, the data ona data port inherently carries two different kinds of information. Thefirst kind is the data itself, and the second is the existence of dataat all. This second kind of information can be used as a control signal.As a result, it becomes possible to provide additional flexibility byenabling a scalar data port to be connected to a control port.

FIG. 2D shows an example data processing graph 230 that exploits theflexibility imparted by an ability to connect scalar ports to controlports.

The data processing graph 230 features a first component 231 labeled“Compute Date Info,” a second component 232 labeled “Do MonthlyReport?”, a third component 233 labeled “Do Weekly Report,” a fourthcomponent 234 labeled “Monthly Report,” a fifth component 235 labeled“Do Weekly Report?”, and a sixth component 236 labeled “Weekly Report.”The data processing graph 230 carries out a procedure that alwaysproduces either a daily report, a daily report and a weekly report, orall three kinds of report. The decision on which of these outcomes willoccur depends on the evaluation of certain date information provided bythe first component 231. Thus, FIG. 2D shows an example of dataeffectively in control of execution.

Execution begins when the first component 231 provides date informationout its output scalar ports to the input scalar port of the secondcomponent 232 and to the input scalar port of the third component 233.The second component 232, which has no connected input control port,immediately goes to work. All other components, including the thirdcomponent 233, have connected input control port(s) and must wait to beactivated by a suitable positive control signal.

The second component 232 inspects this date information and determineswhether it is appropriate to do a monthly report. There are two possibleoutcomes: either a monthly report is required, or it is not. Both thesecond component 232 and the third component 233 have two output scalarports, and are configured to perform a selection function that providesa data element that acts as a positive control signal on one outputscalar port (i.e., the selected port), and negative control signal onthe other output scalar port.

If, based on the date information, the second component 232 determinesthat no monthly report is required, the second component 232 sends adata element out its bottom output scalar port to the input control portof the third component 233. This data element is interpreted as apositive control signal that indicates to the third component 233 thatthe second component 232 has finished processing the data provided bythe first component 231 and that the third component 233 may now beginprocessing its received date information data.

On the other hand, if the second component 232 determines that, based onthe date information provided by the first component 231, a monthlyreport is required, it instead sends a data element that is interpretedas a positive control signal from its output scalar port to an inputcontrol port of the fourth component 234. Although the data element ismore than just a control signal, the fourth component 234 treats it as apositive control signal because it is being provided to its inputcontrol port. The fourth component 234 ignores the actual data in thedata element and just uses the existence of the data element as apositive control signal.

The fourth component 234 proceeds to create a monthly report. Uponcompletion, the fourth component 234 outputs a control signal from itsoutput control port to an input control port of the third component 233.This tells the third component 233 that it (i.e. the third component233) can now begin processing the date information that the firstcomponent 231 supplied to it.

Thus, the third component 233 will always eventually process the dataprovided by the first component 231 via its input scalar port. The onlydifference lies in which component triggers it to start processing: thesecond component 232 or the fourth component 234. This is because thetwo input control ports on the third component 233 will be combinedusing OR logic such that a positive control signal received at eitherport (or both) will trigger processing.

The remainder of the graph 230 operates in essentially the same way butwith the third component 233 taking over the role of the secondcomponent 232 and the sixth component 236 taking over the role of thefourth component 234.

Upon being activated by a control signal at its input control ports,which comes either from the second component 232 or the fourth component234, the third component 233 inspects the date information provided bythe first component 231 over the data link connecting the firstcomponent 231 to the third component 233. If the third component 233determines from the date information that no weekly report is required,it sends a data element interpreted as a positive control signal out ofone of its output scalar ports to the input control port of the fifthcomponent 235.

On the other hand, if the third component 233 determines that a weeklyreport is required, it sends a data element interpreted as a positivecontrol signal out of its other output scalar port to an input controlport of the sixth component 236. The sixth component 236 proceeds tocreate a weekly report. Upon completion, it sends a data elementinterpreted as a positive control signal from its output scalar port toan input control port of the fifth component 235.

The fifth component 235 will thus always eventually execute, with theonly difference being whether the third component 233 or the sixthcomponent 236 ultimately triggers it to begin execution. Upon receivinga control signal from either the third component 233 or the sixthcomponent 236, the fifth component 235 creates the daily report.

An example is shown in FIG. 2E, which also shows the use of both scalarand collection data ports.

FIG. 2E shows a data processing graph 240 having a first component 241labeled “Input File,” a second component 242 labeled “Get Filename FromRequest,” a third component 243 labeled “Read File,” a fourth component244 labeled “Is Bad Record?”, a fifth component 245 labeled “InvalidRecords,” a sixth component 246 labeled “Generate Bad Record Filename,”a seventh component 247 labeled “Any Validation Errors?”, and an eighthcomponent 248 labeled “Send Alert.” This graph is intended to write badrecords to a file and to send an alert upon detecting such a bad record.

The components 241 and 243 are examples of components that serve assources of data, and component 245 is an example of a component thatserves as a sink of data. The components 241 and 243 use as their sourcean input file that may be stored in any of a variety of formats in afilesystem (such as a local filesystem, or a distributed filesystem). Aninput file component reads the contents of a file and produces acollection of records from that file. A scalar input port (as shown oncomponent 243) provides a data element that specifies the location ofthe file to be read (e.g., a path or a uniform resource locator) and therecord format to be used. In some cases the location and record formatmay be provided as parameters to the input file component, in which casethe input scalar port need not be connected to any upstream componentand need not be shown (as for component 241). A collection output port(as shown on both component 241 and 243) provides the collection ofrecords. Similarly, an output file component (such as component 245)would write a collection of records received over an input collectionport to an output file (whose location and record format may optionallybe specified by an input scalar port). An input file or output filecomponent may also include a control input or output port that is linkedto a control port of another component (such as component 245).

In the illustrated data processing graph 240, components that are withinthe larger dashed rectangle are part of an execution set. This executionset contains another execution set nested within it. This nestedexecution set, also shown within a dashed rectangle, contains only thefourth component 244. Execution sets are discussed in more detail below.

In operation, the first component 241 reads an input file. As it isexecuting, it provides the collection of records within the input fileto the second component via a data link from an output collection dataport to an input collection data port of the second component 242.Different instances of the second component 242 and the other downstream components (which are in the same execution set) may be executedfor each record in the collection, as will be described in more detailbelow. Since the second component 242 does not have anything connectedto its control input, it immediately begins processing. Upon completion,the second component 242 provides a filename on its output scalar ports.This filename is received by both the third component 243 and the sixthcomponent 246 at respective input scalar ports.

The third component 243 immediately reads the file identified by thefilename and provides the content of the file on an output collectionport for delivery to an input scalar port of an instance of the fourthcomponent 244. Meanwhile, the sixth component 246 receives the samefilename and outputs another filename, which it provides to both onoutput scalar ports connected to corresponding input scalar ports of thefifth component 245 and the seventh component 247.

Upon receiving a filename from the sixth component 246 and the badrecords from the fourth component 244, the fifth component 245 writesthe bad records to the output file whose filename is identified by thesixth component 246.

The seventh component 247 is the only one not primed to execute uponreceiving data at its data input port. When the fifth component 245 isfinished writing to the output file, it sends a control signal out itscontrol output port to the input control port of the seventh component247. If the seventh component 247 determines that there was an error, itthen provides data to the input scalar port of the eighth component 248.This causes the eighth component 248 to generate an alarm. This providesan example in which control ports are used to limit execution of certaincomponents within a data processing graph.

It should be apparent that the ability to control processing in onecomponent based on the state of another component carries with it thepossibility of controlling processing when a set of multiple upstreamcomponents have all reached particular states. For example, a dataprocessing graph can support multiple control links to or from the samecontrol port. Alternatively, in some implementations, a component caninclude multiple input and output control ports. Default logic can beapplied by the compiler/interpreter 120. The developer can also providecustom logic for determining how control signals will be combined. Thiscan be done by suitably arranging combinatorial logic to apply to thevarious control links of the upstream components, and trigger startup ofa component only when a certain logical state is reached (e.g., when allupstream components have completed, and when at least one has sent anactivation control signal in the case of the default OR logic).

In general, a control signal can be a signal that triggers thecommencement of processing or triggers the suppression of processing.The former is a “positive control signal” and the latter is a “negativecontrol signal.” However, if combinatorial logic is used to determinewhether or not a task should be invoked (triggering commencement ofprocessing) it is possible for the logic to “invert” the usualinterpretation, such that the task is invoked only when all inputsprovide a negative control signal. Generally, the combinatorial logicmay provide an arbitrary “truth table” for determining a next state in astate machine corresponding to the control graph described in moredetail below.

An unconnected control port can be assigned a default state. In oneembodiment, the default state corresponds to a positive control signal.As described in more detail below, this can be achieved by the use ofimplicit begin and end components in a control graph representing thedata processing graph.

The different types of data ports on various components allow data topass over links between components in different ways depending on thetypes of input and output ports that link those components. As describedabove, a scalar port represents production (for a scalar output port) orconsumption (for a scalar input port) of at most a single data element(i.e., 0 or 1 data elements). Whereas, a collection port representsproduction (for a collection output port) or consumption (for acollection input port) of a set of potentially multiple data elements.By supporting both types of data ports in a single data processinggraph, computing resources can be allocated more efficiently and a morecomplex control flow and data flow can be generated between tasks,allowing a developer to easily indicate the behavior that is desired.

Referring to FIG. 3A, a data processing graph 300 includes a series ofthree connected components, a first component (A1) 302, a secondcomponent (B1) 304, and a third component (C1) 306. The first componentincludes a collection type input port 308 and a scalar type output port310. The second component 304 includes a scalar type input port 312 anda scalar type output port 314. The third component includes a scalartype input port 316 and a collection type output port 318.

A first link 320 connecting the scalar output port 310 of the firstcomponent 302 to the scalar input port 312 of the second component 304both allows data to pass between the first component 302 and the secondcomponent 304 and at the same time enforces serial execution of thefirst and second components 302, 304. Similarly, a second link 322connecting the scalar output port 314 of the second component 304 to thescalar input port 316 of the third component 306 both allows data topass between the second component 304 and the third component 306 andenforces serial execution of the second and third components 304, 306.

Due to the interconnections of the scalar ports in FIG. 3A, the secondcomponent 304 begins executing only after the first component 302completes (and passes a single data element over the first link 320),and the third component 306 begins executing only after the secondcomponent 304 completes (and passes a single data element over thesecond link 322). That is, each of the three components in the dataprocessing graph runs once in the strict sequence A1/B1/C1.

In some examples, one or more of the components can be placed into asuppressed state, meaning that the one or more components do not executeand therefore do not pass any data elements out of their output ports.Enabling components to be suppressed avoids wasted resources, forexample, by ensuring that components that will not perform any usefulprocessing do not need computing resources (e.g., processes or memory)to be devoted to them. Any components with scalar input ports connectedonly to the output ports of suppressed components do not execute sincethey receive no data. For example, if the first component 302 is placedin a suppressed state then the scalar input port 312 of the secondcomponent 304 receives no data from the scalar output port 310 of thefirst component 302 and therefore does not execute. Since the secondcomponent 304 does not execute, the scalar input port 316 of the thirdcomponent 306 receives no data from the scalar output port 314 of thesecond component 304 and also does not execute. Thus, the data passedbetween two scalar ports also acts as a positive control signal similarto the signal sent between two linked control ports.

In the exemplary data processing graph of FIG. 3A, the input port 308 ofthe first component 302 and the output port of the third component 318happen to be collection ports, which have no effect on the serialexecution behavior of the first, second, and third components 302, 304,306 imposed by the scalar ports connecting them.

In general, collection ports are used both to pass a collection of dataelements between components and at the same time may give the runtimesystem a license to reorder the data elements within that set.Reordering of data elements of an unordered collection is allowedbecause there is either no dependence on the state of the computationfrom one data element to another data element, or if there is globalstate that is accessed as each data element is processed the final stateis independent of order in which those data elements were processed.This license to reorder provides flexibility for delaying decisionsabout parallelization until runtime.

Referring to FIG. 3B, a data processing graph 324 includes a series ofthree connected components, a first component (A2) 326, a secondcomponent (B2) 328, and a third component (C2) 330. The first component326 includes a collection type input port 332 and a collection typeoutput port 334. The second component 328 includes a collection typeinput port 336 and a collection type output port 338. The thirdcomponent 330 includes a collection type input port 340 and a collectiontype output port 342.

Each of the three components 326, 328, 330 specifies how a collection ofone or more input elements is processed to generate a collection of oneor more output elements. There is not necessarily a one-to-onecorrespondence between a particular input element and a particularoutput element. For example, a number of data elements in a firstcollection of data elements 344 between the first component 326 and thesecond component 328 may be different than a number of elements in asecond collection of data elements 346 between the second component 328and the third component 330. The only constraints placed on connectionsbetween collection ports is that each data element in the collection ispassed from one collection port to another, while allowing arbitraryreordering between the first component 326 and the second component 328and between the second component 328 and the third component 330 withrespect to the order in which they are processed. Alternatively, inother examples, the collection ports can optionally be configured topreserve order. In this example, the three components 326, 328, 330start up together and run concurrently, allowing pipeline parallelism.

The compiler/interpreter 120 described in relation to FIG. 1 isconfigured to recognize collection port to collection port connectionsand translate the computation into executable code in a manner that isappropriate for the computation being performed. The unordered nature ofthe collection data link gives the compiler/interpreter 120 flexibilityin how this is accomplished. For example, if it happens to be the casethat, for the second component 328, each output element is computedbased on a single input element (i.e., there is no state maintainedacross data elements), the compiler/interpreter 120 may allow theruntime system to dynamically parallelize the processing of the dataelements by instantiating as many as one instance of the component perdata element (e.g., depending on the computing resources available atruntime). Optionally, state can be maintained across data elements incomponents that have input collection ports in special cases. But in thegeneral case, the runtime system can be allowed to parallelize thecomponent's task. For example, if the runtime system detects that noglobal state is being maintained, it may be allowed to parallelize thetask. Some components can also be configured to support maintainingstate, in which case parallelization may be disallowed. If thecollection is unordered, the fact that order does not need to bepreserved among data elements means that each instance of the secondcomponent 328 can provide its output data element to the third component330 as soon as it is available, and the third component 330 can beginprocessing those data elements before all instances of the secondcomponent 328 have finished.

In some examples, a graph developer can explicitly indicate that theprocessing of the data elements in a collection of data may bedynamically parallelized by connecting a collection type output port ofone component to a scalar type input port of another component. Such anindication also requires that state is not maintained between processingof different elements of the collection. Referring to FIG. 3C, a dataprocessing graph 348 includes a series of three connected components, afirst component (A3) 350, a second component (B3) 352, and a thirdcomponent (C3) 354. The first component 350 includes a collection typeinput port 356 and a collection type output port 358. The secondcomponent 352 includes a scalar type input port 360 and a scalar typeoutput port 362. The third component 354 includes a collection typeinput port 364 and a collection type output port 366.

The collection type output port 358 of the first component is connectedto the scalar type input port 360 of the second component 352 by a firstlink 368 and the scalar type output port 362 of the second component 352is connected to the collection type input port 364 by a second link 370.As is described in greater detail below, a link from a collection typeoutput port to a scalar type input port implies an entry point into anexecution set and a link from a scalar type output port to a collectiontype input port implies an exit point of an execution set. Verygenerally, as is described in greater detail below, components includedin an execution set may be dynamically parallelized by the runtimecontroller to process data elements from a collection of data elements.

In FIG. 3C, the link 368 between the collection type output port 358 ofthe first component 350 and the scalar type input port 360 of the secondcomponent 352 implies an entry point into an execution set. The link 370between the scalar type output port 362 of the second component 352 andthe collection type input port 364 of the third component 354 implies anexit point of the execution set. That is, the second component 352 isthe only component in the execution set.

Since the second component 352 is included in the execution set, aseparate instance of the second component 352 is launched for each dataelement received from the collection type output port 358 of the firstcomponent 350. At least some of the separate instances may run inparallel, depending on decisions that may not be made until runtime. Inthis example the first (350) and third (354) components start uptogether and run concurrently, while the second component (352) runsonce for each data element within the collection received over the link368. Alternatively, the second component 352 can run once for each tupleof multiple data elements within the collection.

2 Execution Sets

As is described above in relation to FIG. 1, the compiler/interpreter120 performs an execution set discovery pre-processing procedure on adata processing graph to prepare the data processing graph forexecution. In a general sense, as used herein, the term “execution set”refers to a set of one or more components that can be invoked as a unitand applied to a portion of data, such as a portion of the data elementsof an output collection port. Therefore, at most one instance of eachcomponent in the execution set is executed for each input data element(or tuple of multiple input data elements presented to one or more inputports of the execution set). Within the execution set, sequencingconstraints are imposed by the links to scalar and control ports, withparallel execution of the components in an execution set beingpermissible as long as the sequencing constraints are not violated. Thecode prepared for an execution set by the compiler/interpreter 120 mayinclude embedded information (e.g., an annotation or modifier) thatindicates how the tasks corresponding to the components are to beperformed when the code is executed (e.g., degree of parallelism). Inexamples in which one instance of an execution set is executed for atuple of multiple data elements in a received collection, the tuple mayconsist of a fixed number of data elements, for example, or a number ofdata elements that share some characteristic (e.g., a common key value).In examples in which there are at least some components that arepermitted to execute in parallel, the execution set may be implementedusing multiple tasks, for example, a task for the execution set as awhole, and one or more sub-tasks for concurrent execution of an instanceof one or more of the components. Therefore, tasks representingdifferent instances of the execution set may themselves be broken downinto even finer-grained tasks, for example, with sub-tasks that mayexecute concurrently. Tasks for different execution sets may generallybe executed independently and in parallel. So if a large dataset has amillion records, for example, there may be a million independent tasks.Some of the tasks may be executed on different nodes 152 of thecomputing platform 150. Tasks may be executed using lightweight threadsthat may be efficiently executed concurrently, even on a single node152.

In general, the execution sets identified by the assignment algorithm(s)(i.e., the execution sets other than a root execution set) receive dataelements through a “driving” scalar data port at the boundary of theexecution set. For each data element received at the driving inputscalar data port of the execution set, each of the components within theexecution set are executed once (if activated) or not at all (ifsuppressed). Multiple instances of the execution set can be instantiatedand executed in parallel to process multiple data elements that areavailable to the execution set from an upstream collection port. Adegree of parallelism for an execution set can be determined at runtime(and includes a possible decision not to parallelize the execution set),and is limited only by the computational resources available at runtime.The individual outputs of the independent instances of the execution setare gathered at output port(s) of the execution set, regardless oforder, and are made available to downstream components. Alternatively,in other embodiments, execution sets other than the root execution setcan be recognized (in some cases, based on user input) that do notrequire a driving input scalar data port. Such an execution set withouta driving input scalar data port can be executed, using the proceduresdescribed herein, in a single instance if appropriate (e.g., for alatched execution set described below), or in multiple instances inparallel. For example, a parameter can be set that determines a numberof times an execution set will execute, and/or a number of parallelinstances of the execution set that will execute.

Very generally, the execution set discovery procedure uses an assignmentalgorithm that determines subsets of components within the dataprocessing graph that are to be applied as a set to input elements of anunordered collection of data elements. The assignment algorithmtraverses the data processing graph and assigns each component to asubset based on assignment rules. As is made apparent in the followingexamples, a given data processing graph may include a number ofexecution sets nested at different levels of an execution set hierarchy.

In the data processing graphs described herein, there are two types ofdata ports: scalar data ports and collection data ports. In general, apair of linked components (i.e., upstream component A 402 and downstreamcomponent B 404 of FIGS. 4A to 4D) will be in the same execution set bydefault if they are connected by a link between ports of the same type(unless they are in different execution sets for another reason). InFIG. 4A, component A 402 has an output port 406 with a scalar type andcomponent B 404 has an input port 408 with a scalar type. Since the link410 between component A 402 and component B 404 connects two scalar typeports, components A and B 402, 404 are in the same execution set in thisexample. In FIG. 4A, since the link between component A 402 andcomponent B 404 is a scalar-to-scalar link, either 0 data elements or 1data element is passed between upstream component A 402 and downstreamcomponent B 404 over the link 410. Upon completion of upstream componentA's 402 processing, a data element is passed over the link 410, unlessupstream component A 402 is suppressed (as is described above), in whichcase no data element is passed over the link 410.

Referring to FIG. 4B, component A 402 has an output port 412 with acollection type and component B 404 has an input port 414 with acollection type. Since the link 410 between component A 402 andcomponent B 404 connects two collection type ports, component A 402 andcomponent B 404 are also in the same execution set in this example. InFIG. 4B, since the link 410 between component A 402 and component B 404is a collection-to-collection link, a set of data elements is passedbetween the upstream and downstream components over the link 410.

When there is a mismatch between the port types on either end of a link,there is an implicit change in a level of the execution set hierarchy.In particular, mismatched ports represent entry points or exit points toan execution set at a particular level of the execution set hierarchy.In some examples, an execution set entry point is defined as a linkbetween a collection type output port and a scalar type input port. InFIG. 4C, one example of an execution set entry point 424 is illustratedat the link 410 between component A 402 and component B 404, since theoutput port 416 of component A 402 is a collection type port and theinput port 418 of component B 404 is a scalar type port.

In some examples, an execution set exit point is defined as a linkbetween a scalar type output port and a collection type input port.Referring to FIG. 4D, one example of an execution set exit point 426 isillustrated at the link 410 between component A 402 and component B 404,since the output port 420 of component A 402 is a scalar type port andthe input port 422 of component B 404 is a collection type port.

The assignment algorithm implemented prior to compilation and/orinterpretation by the compiler/interpreter 120 uses execution set entryand execution set exit points to discover the execution sets present inthe data processing graph.

2.1 Stack Based Assignment Algorithm

For illustration purposes, in a first example, a data processing graphhas a simple, one dimensional graph structure, and a simpler assignmentalgorithm is illustrated using a stack based algorithm. In the stackbased assignment algorithm, each component in the data processing graphis labeled with one or more “ID strings,” which are made up of integersseparated by a separation character ‘/’. The number of times theseparation character ‘/’ appears in the ID string for a given componentdetermines the level of the component in the execution set hierarchy. Insome examples, a component may have multiple input links and thereforemay have multiple ID strings. In such cases, the algorithm has rules,described in greater detail below, for determining which ID string touse.

In one example of the stack based assignment algorithm, thecompiler/interpreter 120 walks the data processing graph, in an upstreamto downstream direction, according to the following procedure.Initially, the most upstream component(s) are labeled with an ID stringof ‘0’ indicating it is a component at the root level of the executionset hierarchy.

The links and components on a path from the most upstream component tothe most downstream component are then traversed. If a link between acollection type output port of an upstream component and a collectiontype input port of a downstream component is encountered, the ID stringof the upstream component is propagated to the downstream component.Similarly, if a link between a scalar type output port of an upstreamcomponent and a scalar type input port of a downstream component isencountered, the ID string of the upstream component is propagated tothe downstream component.

If a link between a collection type output port of an upstream componentand a scalar type input port of a downstream component is encountered,the downstream component is assigned a label including the label of theupstream component with ‘/n’ appended to its end, where n is 1+<max ofall existing ID string integers>. If a link between a scalar type outputport of an upstream component and a collection type input port of adownstream component is encountered, the downstream component isassigned a label including the label of the upstream component with itsright-most ID string integer (and its separation character ‘/’) removed.

In some examples, various conditions may be considered illegal and willcause an error in the algorithm (e.g., if a component has two differentID strings at the same level of the execution set hierarchy, or thepresence of a cycle in an execution set).

Referring to FIG. 5, the stack based assignment algorithm describedabove is applied to an exemplary data processing graph 550, resulting inthe discovery of two execution sets (in addition to a Root, “Level 0”execution set 551): a first “Level 1” execution set 570 and a second“Level 2” execution set 572 nested within the first “Level 1” executionset 670. To arrive at the discovery of the two execution sets 570, 572,the stack based assignment algorithm first labels the most upstreamcomponent, a first dataset 656, with an ID string of ‘0.’ The stackbased assignment algorithm then traverses the components and of the onedimensional path through the data processing graph 550. In traversingthe path, the stack based assignment algorithm first traverses the linkfrom the first dataset 556 to a first component 558. Since the outputport of the first dataset 556 is a collection type output port and theinput port of the first component 558 is a scalar type input port, thefirst component 558 is assigned an ID string of ‘0/1’ which is the IDstring of the first dataset 556 with ‘/1’ appended to its end, where 1is the max of all existing ID string integers +1. In general, appending‘/1’ to the ID string of the first component 558 is an indication of atransition from the Root, “Level 0” execution set 551 to the “Level 1”execution set 570. In some examples, this transition is representedusing a first execution set entry point indicator 557.

The assignment algorithm then traverses the link from the firstcomponent 558 to a second component 560. Since the output port of thefirst component 558 is a collection type output port and the input portof the second component 560 is a scalar type input port, the secondcomponent 560 is assigned an ID string of ‘0/1/2’ which is the ID stringof the first component 558 with ‘/2’ appended to its end, where 2 is themax of all existing ID string integers +1. In general, appending ‘/2’ tothe ID string of the second component 560 is an indication of atransition from the “Level 1” execution set 570 to the “Level 2”execution set 572. In some examples, this transition is representedusing a second execution set entry point indicator 559.

The assignment algorithm then traverses the link from the secondcomponent 560 to a third component 562. Since the output port of thesecond component 560 is a scalar type output port and the input port ofthe third component 562 is a scalar type input port, the ID string(i.e., ‘0/1/2’) of the second component 560 is propagated to the thirdcomponent 562.

The assignment algorithm then traverses the link from the thirdcomponent 562 to a fourth component 564. Since the output port of thethird component 562 is a scalar type output port and the input port ofthe fourth component 564 is a collection type input port, the fourthcomponent is assigned an ID string of ‘0/1’ which is the ID string ofthe third component 562 with its right-most ID string of ‘2’ (and itsseparation character ‘/’) removed. In general, removing the ‘/2’ fromthe ID string of the third component 562 is an indication of atransition from the “Level 2” execution set 572 to the “Level 1”execution set 570. In some examples, this transition is representedusing a first execution set exit point indicator 563.

The assignment algorithm then traverses the link from the fourthcomponent 564 to a fifth component 566. Since the output port of thefourth component 564 is a scalar type output port and the input port ofthe fifth component 566 is a collection type input port, the fifthcomponent 566 is assigned an ID string of ‘0’ which is the ID string ofthe fourth component 564 with its right-most ID string integer (and itsseparation character ‘/’) removed. In general, removing the ‘/1’ fromthe ID string of the fourth component 564 is an indication of atransition from the “Level 1” execution set 570 to the Root, “Level 0”execution set 551. In some examples, this transition is representedusing a second execution set exit point indicator 565.

Finally, the assignment algorithm traverses the link from the fifthcomponent 566 to a second dataset 568. Since the output port of thefifth component 566 is a collection type output port and the input portof the second dataset 568 is a collection type input port, the ID stringof the fifth component 566 (i.e., ‘0’) is propagated to the seconddataset 568.

In some examples, in addition to entry point indicators and exit pointindicators, the change between the flow of collections of data elementsand individual scalar data elements can be visually represented usingadditional visual cues within the user interface. For example, the linerepresenting a link can be thicker between a collection port and anindicator and thinner between an indicator and a scalar port.

The result of the stack based assignment algorithm includes a version ofthe data processing graph 550 where each of the components is labeledwith an ID string. In the example of FIG. 5, the first dataset 556, thesecond dataset 568, and the fifth component 566 are all labeled with theID string ‘0.’ The first component 558 and the fourth component 564 arelabeled with the ID string ‘0/1.’ The second component 560 and the thirdcomponent 562 are each labeled with the ID string ‘0/1/1.’

Each unique ID string represents a unique execution set in the executionset hierarchy. Those components with the ID string ‘0’ are grouped intothe Root, “Level 0” execution set 551 in the execution hierarchy. Thosecomponents with the ID string ‘0/1’ are grouped into the “Level 1”execution set 670, which is nested within the root execution set 651(where ‘0/1’ can be read as execution set 1 nested within execution set0). Those components with the ID string ‘0/1/2’ are grouped into a“Level 2” execution set 572, which is nested within both the Root,“Level 0” execution set 551 and the “Level 1” execution set 570.

2.2 Global Mapping Based Assignment Algorithm

In some examples, for more general data processing graphs, the stackbased assignment algorithm may not be sufficient for correctlydetermining the execution set hierarchy. For example, in general dataprocessing graphs, any given component can have multiple input portsand/or multiple output ports, rendering general data processing graphsincompatible with the stack based approach. In such examples, a globalmapping based assignment algorithm is used to determine the executionset hierarchy.

The global mapping based assignment algorithm leverages the fact thatdata processing graphs are constrained to be directed acyclic graphs.Directed acyclic graphs can be processed using a topological sortedorder, ensuring that each component of the graph is only processed afterall of the components immediately upstream of the component have beenprocessed. Since all of the components immediately upstream of thecomponent are known to have been processed, the ID string for thecomponent can be determined by choosing the ID string of the most deeplynested (in the execution set hierarchy) component that is directlyupstream from the component.

In some examples, the global mapping based assignment algorithm uses astandard topological sorting algorithm such as Kahn's algorithm toobtain a topological sorted order for a given data processing graph.Kahn's algorithm is summarized by the following pseudo-code:

L ← Empty list that will contain the sorted elements S ← Set of allnodes with no incoming edges while S is non-empty do remove a node nfrom S add n to tail of L for each node m with an edge e from n to m doremove edge e from the graph if m has no other incoming edges theninsert m into S if graph has edges then return error (graph has at leastone cycle) else return L (a topologically sorted order)

After determining the topological sorted order, the global mapping basedassignment algorithm traverses the components of the data processinggraph in the topological sorted order to determine the proper ID string(or simply an ID number) for each of the components. In particular, asthe components are traversed, every component of the data processinggraph copies its ID string to its output port. Components that areimmediately downstream from an upstream component and are not separatedfrom the upstream component by an execution set entry point or anexecution set exit point read the ID string from the upstreamcomponent's output port and use the ID string as their ID string.

For an upstream component that is separated from a downstream componentby an execution set entry point, a new ID string is allocated at theexecution set entry point and is provided to the downstream componentfor use as its ID string. A mapping of the ID string of the upstreamcomponent to the ID string of the downstream component (i.e., aparent/child mapping) is stored in a global mapping data store for lateruse.

For an upstream component that is separated from a downstream componentby an execution set exit point, the ID string at the output port of theupstream component is read by the execution set exit point. The globalmapping data store is then queried to determine the parent ID string forthe ID string at the output port. The parent ID string is provided tothe downstream component for use as its ID string.

Referring to FIG. 6, one example of an exemplary general, twodimensional data processing graph 628 is analyzed using the globalmapping based assignment algorithm described above. The data processinggraph 628 includes a first dataset (D1) 632, a first component (C1) 638,a second component (C2) 640, a third component (C3) 645, a fourthcomponent (C4) 646, a fifth component (C5) 642, a sixth component (C6)644, and a second dataset (D2) 634. Before assigning to ID strings toindividual components of the data processing graph 628, a topologicalsorting algorithm (e.g., Kahn's algorithm) is applied to the dataprocessing graph, resulting in a topological sorted order of: D1, C1,C2, C3, C4, C5, C6, D2.

With the topological sorted order determined, the global mapping basedassignment algorithm traverses the components of the data processinggraph in the topological sorted order to determine the proper ID stringfor each of the components, resulting in the discovery of a “Level 1”execution set 630 and a “Level 2” execution set 631 (in addition to theRoot, “Level 0” execution set). To arrive at the discovery of the twoexecution sets 630, 631, the global mapping based assignment algorithmfirst labels the most upstream component, a first dataset (D1) 632, withan ID string of ‘0.’ The stack based assignment algorithm then traversesthe components and links of the data processing graph 628 in thetopological sorted order.

The global mapping based assignment algorithm first traverses the linkfrom the first dataset (D1) 632 to the first component (C1) 638. Sincethe output port of the first dataset (D1) 632 is a collection typeoutput port and the input port of the first component (C1) 638 is acollection type input port, no execution set entry point or exit pointis identified and the ID string (i.e., ‘0’) of the first dataset (D1)632 is read from the output port of the first dataset (D1) 632 andassigned to the first component (C1) 638.

The assignment algorithm then traverses the link between the firstcomponent (C1) 638 and the second component (C2) 640. Since the outputport of the first component (C1) 638 is a collection type output portand the input port of the second component (C2) 640 is a scalar typeinput port, a first execution set entry point 639 is identified betweenthe two components 638, 640. At the first execution set entry point 639,a new ID string (i.e., ‘1’) is allocated and assigned as the ID stringof the second component (C2) 640. A mapping 653 of the parent ID string(i.e., ‘0’) for the first execution set entry point 639 to the child IDstring (i.e., ‘1’) for the first execution set entry point 639 is storedin a global mapping data store 649 for later use.

The assignment algorithm then traverses the link from the secondcomponent (C2) 640 to the third component (C3) 645. Since the outputport of the second component (C2) 640 is a collection type output portand the input port of the third component 645 is a scalar type inputport, a second execution set entry point 641 is identified between thetwo components 640, 645. At the second execution set entry point 641, anew ID string (i.e., ‘2’) is allocated and assigned as the ID string ofthe third component (C3) 645. A mapping 651 of the parent ID string(i.e., ‘1’) for the second execution set entry point 641 to the child IDstring (i.e., ‘2’) for the second execution set 641 is stored in theglobal mapping data store 649 for later use.

The assignment algorithm then traverses the link from the thirdcomponent (C3) 645 to the fourth component (C4) 646. Since the outputport of the third component (C3) 645 is a collection type output portand the input port of the fourth component (C4) 646 is a collection typeinput port, no execution set entry or exit points is identified and theID string (i.e., ‘2’) of the third component (C3) 645 is read from theoutput port of the third component (C3) 645 and assigned to the fourthcomponent (C4) 646.

The assignment algorithm then traverses the link from the fourthcomponent (C4) 646 to the fifth component (C5) 642. Since the outputport of the fourth component (C4) 646 is a scalar type output port andthe input port of the fifth component (C5) 642 is a collection typeinput port, a first execution set exit point 647 is identified betweenthe two components 646, 642. At the first execution set exit point 647,the ID string of the fourth component (C4) 646 is read from the outputport of the fourth component (C4) 646 and is used to query the globalmapping data store 649. The global mapping data store 649 returns theparent child relationship 651 (i.e., ‘1/2’) stored in association withthe second execution set entry point 641. The parent ID string (i.e.,‘1’) of the parent/child relationship 651 is assigned as the ID stringfor the fifth component (C5) 642.

The assignment algorithm then traverses the link from the fifthcomponent (C5) 642 to the sixth component (C6) 644. Since the outputport of the fifth component (C5) 642 is a scalar type output port andthe input port of the sixth component (C6) 644 is a collection typeinput port, a second execution set exit point 643 is identified betweenthe two components 642, 644. At the second execution set exit point 643,the ID string of the fifth component (C5) 642 is read from the outputport of the fifth component (C5) 642 and is used to query the globalmapping data store 649. The global mapping data store 649 returns theparent/child relationship 653 (i.e., ‘0/1’) stored in association withthe first execution set entry point 639. The parent ID string (i.e.,‘0’) of the parent/child relationship 653 is assigned as the ID stringfor the sixth component (C6) 644.

Finally, the assignment algorithm traverses the link from the sixthcomponent (C6) 644 to the second dataset (D2) 634. Since the output portof the sixth component (C6) 644 is a collection type output port and theinput port of the second dataset (D2) 634 is a collection type inputport, no execution set entry point or exit point is identified and theID string (i.e., ‘0’) of the sixth component (C6) 644 is read from theoutput port of the sixth component (C6) 644 and assigned to the seconddataset (D2) 634.

The result of the global mapping based assignment algorithm includes aversion of the data processing graph 628 where each of the components islabeled with an ID string. In the example of FIG. 6, the first dataset(D1) 632, the first component (C1) 638, the sixth component (C6) 644,and the second dataset (D2) 634 are all labeled with the ID string ‘0.’The second component (C2) 640 and the fifth component (C5) 642 are bothlabeled with the ID string ‘1.’ The third component (C3) 645 and thefourth component (C4) 646 are both labeled with the ID string ‘2.’

Each unique ID string represents a unique execution set in the executionset hierarchy. Those components with the ID string ‘0’ are grouped intothe Root, “Level 0” execution set 629 in the execution hierarchy. Thosecomponents with the ID string ‘1’ are grouped into a “Level 1” executionset 630, which is nested within the root execution set 629. Thosecomponents with the ID string ‘2’ are grouped into the “Level 2”execution set 631, which is nested within the Root, “Level 0” executionset 629 and further within the “Level 1” execution set 630.

2.3 User Defined Execution Sets

In the examples described above, assignment algorithm(s) are used toautomatically discover the execution sets present in a data processinggraph without any user intervention. However, in some examples, a usermay require functionality other than the functionality afforded by theassignment algorithm(s). In such cases, a user can explicitly addexecution set entry points and exit points to explicitly define whereexecution sets begin and/or end. Referring to FIG. 7, a data processinggraph 776 includes a first dataset 774, a first component 778, a secondcomponent 780, and a second dataset 790. Applying the assignmentalgorithm(s) described above to the data processing graph 776 wouldresult in the discovery of a single execution set including the firstcomponent 778 and the second component 780. However, in this case, theuser has explicitly defined two execution sets (i.e., a first executionset 782 and a second execution set 786) for the data processing graph776. In particular, the user has inserted an execution set exit pointcomponent 784 into a link coming out of an output port of the firstcomponent 778 and has inserted an execution set entry point 788 into thelink going into an input port of the second component 780. By adding theexecution set exit point 784 and the execution set entry point 788 tothe link between the first component 778 and the second component 780the user has essentially broken what was a single execution set into twoseparate execution sets 782, 786.

In some examples, the user defines all of the execution set entry andexit points for a data processing graph. In other examples, the userdefines some of the execution set entry and exit points and then leavesit to the assignment algorithm(s) to discover the remaining executionset entry points and exit points for the data processing graph.

2.4 Same Set as Relationships

In some examples, a user may wish to explicitly designate to whichexecution set a given component belongs. For example, referring to FIG.8A, data processing graph 892 includes a first execution set 894 whichreceives data elements from a create data component 896 and a read tablecomponent 898. These components are similar to an input file componentexcept they have different sources for the collection of data elementsthat they provide. For the create data component 896, instead of ascalar input port that specifies a file location, there is an (optional)scalar input port that specifies a number of records data elements to beproduced, and there is also a parameter that specifies how each dataelement is to be generated. For the read table component 898, instead ofa scalar input port that specifies a file location, there is an(optional) scalar input port that specifies a table in a database. Thefirst execution set 894 includes a first component 891 and a secondcomponent 893 which together process the data elements from the createdata component 896 and the read table component 898 to generate anoutput that is provided to a first dataset 899.

In FIG. 8A, the read table component 898 is external to the firstexecution set 894 meaning that it is run once and outputs a collectionof data elements from its collection type output port. The collection ofdata elements traverses the boundary of the first execution set 894 andis provided to a collection type input port on the first component 891.For each parallel instance of the components in the execution set 894, acopy of the collection of data elements at the collection type inputport on the first component 891 is created. Generally, whether a link isfrom a collection port, a scalar port, or a control port, a link betweencomponents that are assigned to different execution sets will have thedata or control elements copied to all instances for links flowing intoan execution set, and will have the data or control elements gatheredfrom all instances for links flowing out of an execution set. Dataelements are gathered into a collection and control elements aregathered into a vector, which may be handled appropriately (includingpossibly flagging it as an error) depending on the control logic of thedownstream component.

Referring to FIG. 8B, in some examples, a user may require that the readtable component 898 is executed for each parallel instance of thecomponents in the execution set 894. To achieve this functionality, theuser can specify a “same set as” relationship between the read tablecomponent 898 and the first component 891. As a result of the userspecifying the “same set as” relationship, the read table component 898is moved into the same execution set (i.e., the first execution set 894)as the first component 891. Since the read table component 898 isincluded in the first execution set 894, each parallel instance of thecomponents in the first execution set 894 executes an instance of theread table component 898.

In some examples, the user can specify the “same set as” relationship byselecting a destination execution set from a menu associated with asource execution set, or by dragging a component from a source executionset to a destination execution set (e.g., via a user interface describedin greater detail below). In some examples, error checking is performedto verify that the dragged component can legally be located in thedestination execution set. For example, one possible requirement thatcan be enforced on any two components that are to have a “same set as”relationship to each other is that there must be at least one paththrough the data processing graph that includes both of thosecomponents.

2.5 Collection Data Replication

In some examples, multiple components in an execution set may each havescalar input ports connected to a single collection output port of anupstream component via an execution set entry point. Similarly, multiplecomponents in an execution set may each have scalar output portsconnected to a single collection input port of a component downstreamfrom the execution set.

In some examples, to provide the same data from a collection type outputport to the scalar input ports of multiple components, an execution setentry point creates replica(s) of each data element from the collectionfor each of the scalar input ports and provides the replica(s) to theircorresponding scalar input ports. Similarly, to merge data output by thescalar output ports of multiple components (from different respectiveiterations of the execution set), an execution set exit point canreceive output data elements from the scalar output ports, merge theoutput data elements, and then provide the merged output data elementsto the collection input port of the downstream component. In general,the collection input port of the downstream component is configured tohandle merged data elements.

Referring to FIG. 9, a data processing graph 923 includes a firstdataset 924, a second dataset 926, and an execution set 928. Theexecution set 928 includes two components: a first component 930 and asecond component 932. The first dataset 924 has a collection output port934 that is connected to and provides a collection of data elements toan execution set entry point 936 of the execution set 928. The seconddataset 926 has a collection input port 938 that is connected to andreceives a collection of data elements from an execution set exit point940 of the execution set 928.

Within the execution set 928, the first component 930 has a first scalarinput port 942 and the second component 932 has a second scalar inputport 944. Both the first scalar input port 942 and the second scalarinput port 944 are connected to and receive individual data elements offrom the execution set entry point 936. As is described above, theexecution set entry point 936 replicates data elements received from thecollection output port 934 to provide an copy of each data element of acollection of data elements to each scalar input port connected to theexecution set entry point 936. In FIG. 9, the execution set entry point936 creates two replicas of each data element and provides one of thereplicas to the first scalar input port 942 and the other replica to thesecond scalar input port 944. As is apparent from the figure, in someexamples a visual representation of the execution set entry point 936 ina graphical user interface provides a representation of how manyreplicas of a data element are created by the execution set entry point936. Also, in other examples, the different entry point indicatorsrepresenting different copies of the replicas can be separated anddistributed around the border of the execution set into as manycomponents as there are within the execution set that need a copy ofeach replicated data element provided from the collection output portfeeding the execution set.

The first component 930 and the second component 932 process theirrespective data elements and provide their respective processed dataelements to the execution set exit point 940 via scalar output ports946, 948. In some examples, the execution set exit point 940 groups theprocessed data elements into pairs and outputs the pairs of processeddata elements to the collection input port 938 of the second dataset926. As is apparent from the figure, in some examples a visualrepresentation of the execution set exit point 940 in a graphical userinterface provides a representation of how many replicas of a dataelement are grouped by the execution set entry point 936.

2.6 Resource Latching

In some examples, components in a given execution set may be runmultiple times in parallel instances. In some examples, the componentsrunning parallel instances may need to access a shared resource. Toprevent race conditions and other problems related to multiple processesaccessing a shared resource, a latching mechanism may be used. Ingeneral, the latching mechanism allows one instance of the components inan execution set to obtain a runtime lock on the shared resource for thetime that it takes the instance to finish running. While an instance hasa shared resource latched, only the components in the instance haveaccess to the shared resource and the components of other instances mustwait for the latch to be released. After the instance has completed, itreleases the runtime lock, allowing other instances to access the sharedresource. The latching mechanism must both latch and unlatch the sharedresource within a single execution set (e.g., using an explicit latchcomponent at the upstream end and an explicit unlatch component at thedownstream end). In some embodiments, such “latched execution sets”cannot be nested nor can they overlap one another.

2.7 Miscellaneous

It is noted that, while the global mapping based assignment algorithm isdescribed in relation to a two dimensional data processing graph, it canalso be used to discover execution sets for one dimensional dataprocessing graphs.

In general, execution sets can be arbitrarily nested.

In general, an execution set has at most one driving data element thatis received for each instance of the execution set from a linked outputcollection port. However, multiple scalar input ports may receive thatsame data element if it is explicitly or implicitly replicated crossingthe boundary of the execution set.

In general, all output scalar ports that have links crossing theboundary of an execution set have all data elements, from each ofmultiple instances of the execution set, gathered into the samecollection provided to a linked input collection port. But, if theexecution set only has a single instance, the output scalar ports thathave links crossing the boundary of the execution set may be linked toan input scalar port.

In general, a link between two ports of the same type can traverse anexecution set boundary, assuming that the traversal of the execution setdoes not cause any cycles in the data processing graph.

In some examples, each execution set is assigned a unique identifier(e.g., a ‘1’) by default. In other examples, each execution set may beassigned an execution set ID path (e.g., ‘1/3/6’). In some examples, auser explicitly supplies an execution set ID string. The execution setID string is not necessarily unique. In the case that an execution setID string is not unique, the execution set ID string can be combinedwith the execution set ID strings of its parent, grandparent, and so on,resulting in a unique ID string.

In some examples, the global mapping based assignment algorithm resultsin components being assigned an ID string that corresponds to the mostdeeply nested execution set. In some examples, when execution sets areassigned execution set ID paths, the execution set ID paths are notnecessarily unique. To compensate for situations where execution set IDpaths are not unique, a constraint is placed on the execution set IDpaths requiring that the execution set ID paths upstream of a givenexecution set, must be “compatible,” where two execution set ID pathsare compatible if and only if they are the same, or one is a properprefix of the other. For example:

/1/2/3 and /1/2/3 are compatible

/1/2/3 and /1/2 are compatible

/1/2 and /1/2/3 are compatible

/1/2/3 and /1 are compatible

/1/2/3 and /1/4 are not compatible

/1/2/3 and /1/4/5 are not compatible

The embodiments described above impose essentially noordering/concurrence constraints on execution of instances of the scalarblocks. But, in some embodiments, other inputs are provided to controlpermissible concurrency and required serialization of subsets of thedata elements that are received from the collection feeding theexecution set. In some embodiments, sequential processing according to apartial ordering may be imposed on some subsets of data elements.

By default the instances of an execution set may run fully parallel.However, in some cases, a user may desire different behavior. Forexample, if the data being processed is account-level data, the user maywant to enforce certain restrictions on processing the data within eachaccount. For example, the user may want to enforce serial execution. Insuch a case, any degree of parallelism may be permitted across accounts,but two data elements for the same account must not be processed at thesame time (i.e., concurrently). Optionally, an additional restrictionmay be in-order processing, such that two data elements for the sameaccount must not be processed out of order according to an order definedby a key, or by a received order, for example.

To accomplish this, a serialization key may be provided for an executionset. All data elements with the same value of the serialization key mustbe processed serially, and in some cases in a well-defined order. Oneway for the runtime system to enforce serial execution for data elementswith the same serialization key is to partition execution set instancesby serialization key: assigning instances whose driving data element hasa particular serialization key (or hash value of the serialization key)to be executed on a particular computing node 152. At runtime, thesystem can ensure that work is evenly distributed across computing nodes152 by scanning a collection of data elements to ensure queues ofrunnable tasks remain full. In a case in which there is not necessarilyan explicitly defined order (such as in a collection), the order may bethe same order as they were produced from an output port (even acollection output port) or an order associated with a differentcollation key that governs the order of processing within aserialization key group. In some cases, an execution set may be forcedto run entirely serially by providing a predefined value as theserialization key.

In some embodiments, an appearance that order has been preserved can bemaintained, even if processing has not been performed strictly accordingto that order. If data at both the input and the output of an executionset are associated with a particular order (e.g., an order of elementswithin a vector), a user may wish to preserve that order. Even withoutserialization in the processing of data elements, output data elementscan be sorted to restore an ordering associated with a corresponding setof input data elements, using an ordering key carried along with thedata elements as they are processed, for example. Alternatively, outputdata elements that were produced in parallel may be merged in the sameorder in which they entered an execution set, without necessarilyrequiring an explicit sort operation to be performed.

Various computational characteristics associated with executing codeprepared for execution sets can be configured by thecompiler/interpreter 120, with or without input from a user. Forexample, the embedded information described above for indicating howtasks corresponding to components within a particular execution set areto be performed may include any of the following. The information mayinclude a compiler annotation that indicates tasks are to be performedcompletely serially (i.e., no parallelism). The information may includea compiler annotation that indicates tasks are to be performed with asmuch parallelism as is allowed by the sequencing constraints. Theinformation may include a compiler annotation that indicates tasksrelated to the same key value are performed serially and tasks relatedto different key values are performed in parallel (i.e., serializationby key, as described above).

Compiler annotations or modifiers can be used to indicate any of avariety of computational characteristics:

-   -   concurrency (e.g., parallel, serial, serial by key, as described        above)    -   precedence between different execution sets (e.g., all tasks of        one execution set occur after all tasks of another execution        set)    -   transactionality (e.g., the tasks of an execution set are        processed as a database transaction)    -   resource latching (e.g., the tasks of an execution set are        performed with a particular resource, such as a shared variable,        locked, allowing the tasks to access the resource as an atomic        unit)    -   ordering (e.g., ordering among data elements is preserved)    -   tuple size (e.g., number of data elements to be operated upon by        each instance of an execution set)

The compiler/interpreter 120 may determine the characteristics based onautomatically analyzing properties of an execution set or of the dataprocessing graph as a whole, and/or based on receiving input from a user(e.g., user annotations within the graph). For example, if key valuesare referenced in an execution set a compiler annotation may indicateserialization by key. If a resource is used within an execution set,compiler modifiers may enable locking/unlocking that resourcebefore/after the execution set. If there are database operations withinan execution set, each instance of the execution set may be configuredto execute as a database transaction. If the number of cores availablecan be determined at compile-time, a compiler annotation may indicatethat each core will execute an instance of an execution set on a tupleof data items that consists of a number of data items equal to the totalsize of the collection divided by number of cores.

The compiler annotations and modifiers can be added to code prepared inthe target language, such as a suitable higher-level language (e.g.,DML), or lower-level executable code, or a target intermediate form ofthe data processing graph. For example, the compiler/interpreter 120 mayinsert components into the data processing graph that explicitlyindicate an entry point or exit point to an execution set, or componentsto begin/end transactions can be placed at entry/exit points of a set ofcomponents for processing a transaction, or components can be used tolock/unlock resources. Alternatively, the compiler/interpreter 120 mayadd a modifier as a modified type of data flow link.

3 User Interface for Data Processing Graphs

In some examples, a user interface allows a user to develop a dataprocessing graph by dragging components onto a canvas and connectingports of the components together using links. In some examples, the userinterface repeatedly applies the assignment algorithm(s) described aboveto the data processing graph as the user develops the data processinggraph. For example, as a user adds a component to the data processinggraph being developed, the assignment algorithm(s) may be applied to thegraph with the added components. The resulting execution sets discoveredby the assignment algorithm(s) can then be displayed as boxes drawnaround components in the user interface, for example, or as arbitrarilyshaped regions enveloping the components, which can be distinguished bya unique color, shading, texture, or label used to render the regioncontaining components in the same execution set. In some examples, theuser can then modify the execution sets discovered by the assignmentalgorithm(s) by adding or removing components to or from execution sets.In some examples, the assignment algorithm(s) verify that the modifiedexecution sets are legal. For example, there may be some configurationsof components and links between various ports that could potentially bedivided into execution sets in any of a variety of legal ways. In suchambiguous cases, the assignment algorithm may select one assignment ofexecution sets by default, but a user may have intended a differentassignment of execution sets, in which case the user can modify theassignment (e.g., by inserting an exit point to close an execution setearlier in a chain of components). Alternatively, the assignmentalgorithm could be configured to recognize ambiguous configurations inwhich multiple legal assignments are possible, and prompt the user forinput to select one.

Referring to FIG. 10A, a user has dragged three components, a firstdataset 1022, a first compute component 1024, and a second dataset 1026onto a canvas 1028 of a data processing graph development userinterface. The user has not yet connected the ports of the components1022, 1024, 1026 together using links, and the assignment algorithm(s)have not yet discovered any execution sets in the data processing graph(other than the root execution set).

Referring to FIG. 10B, when the user connects the ports of thecomponents 1022, 1024, 1026 together with links, the assignmentalgorithm(s) automatically discover a first execution set 1030, thefirst execution set 1030 including the first compute component 1024. Thefirst execution set 1030 is displayed to the user through the userinterface. As a user continues to add components and links to the graph,the assignment algorithm(s) automatically discover and display executionsets through the user interface.

Referring to FIG. 10C, in some examples, a user may need to break thelinks (e.g., to insert another component into the link). In suchexamples, if the assignment algorithm(s) were allowed to re-analyze thedata processing graph, the first execution set 1030 would be removed,possibly causing disruption and loss of work for the user.

To avoid such a disruption, when the user removes flows or componentsfrom the data processing graph, the assignment algorithm(s) may not beexecuted but instead the remaining components and their execution setassociations are left untouched. For example, in FIG. 10C, with itsinput and output ports disconnected, the first component 1024 is stillincluded in the first execution set 1030. In some examples, whendisconnected components are reconnected, the assignment algorithm(s) arepermitted to automatically discover and display any execution setsassociated with the reconnected components.

In some examples, if a component of a data processing graph does nothave an explicit (e.g., user defined) execution set designation, theassignment algorithm(s) are allowed to discover which execution set thecomponent belongs in. Otherwise, if a component has an explicit, userdefined execution set designation, the assignment algorithm(s) are notallowed to choose in which execution set the component is included. Forexample, if a user manually moves a component into a given executionset, the assignment algorithm(s) are not allowed to include thecomponent in any execution set other than the user designated executionset. That is, any user modifications to the data processing graph cannotbe overridden by the assignment algorithm(s).

In some examples, the user interface allows a user to use a gesture orother interaction with an input device to promote a component into agiven execution set and/or demote a component out of a given executionset. In some examples, the user can promote or demote components using amenu option or other affordance. In other examples, the user can simplydrag a component into a desired execution set in the user interface.

In some examples, the user interface allows users to specify one or moreconstraints for the execution sets in a data processing graph. Forexample, a user can constrain an execution to run no more than N timesparallel at a given time.

In some examples, the compiler/interpreter 120 receives a representationof the data processing graph that includes a mixture of manually definedexecution sets and execution sets discovered by the assignmentalgorithm.

In some examples, a user can define another type of execution set,referred to as an enable/suppress execution set using the interface. Forexample, a user can draw a box around one or more components that theywish to be included in the enable/suppress execution set. Theenable/suppress execution set includes the one or more components andhas a scalar input port. If a scalar output port of an upstreamcomponent provides one data element to the scalar input port of theenable/suppress execution set, the components in the enable/suppressexecution set are allowed to execute. If the scalar output port of theupstream component provides zero data elements to the scalar input portof the enable/suppress execution set, the components included in theenable/suppress execution set are suppressed. Any execution set(including an enable/suppress execution set) can include control inputand output ports that can be used to determine whether the entireexecution set will be executed or not, and to propagate control signalsto other components or execution sets. If an execution set isparallelized (i.e., has multiple instances), then the input control portmust be activated before any instance is executed, and the outputcontrol port is activated after all instances have completed execution.In some examples, these input and output control ports are provided byplacing visual representations of the ports on the border of anexecution set. In other examples, these to input and output controlports are provided by placing them on an additional component in frontof an execution set. For example, this additional “forall component” maybe inserted (e.g., automatically by the user interface, or manually by auser) between the upstream collection output data port and the entrypoint indicator, or in place of the entry point indicator (i.e., betweenthe upstream collection output data port and the driving input scalardata port).

As is noted above in relation to FIG. 7, in some examples, a user canexplicitly define execution set entry points and exit points by placingexecution set entry point and exit point components along the flows ofthe data processing graph.

In some examples, the user interface provides real time feedback tonotify a user when their graph includes an illegal operation. Forexample, if there is a conflict caused by the component being in theuser designated execution set, the assignment algorithm(s) may issue awarning to the user through the user interface. To provide real timefeedback, the assignment algorithm(s) apply validation rules to a dataprocessing graph to inform a user whether the data processing graph islegal. Referring to FIG. 11A, one example of an illegal data processinggraph configuration 1195 includes two data sources, a first data source1191 feeding a first collection of data elements to a scalar port of afirst component 1102 in a first execution set 1197 and second datasource 1198 feeding a second collection of data elements to a scalarport of a second component 1104 in a second execution set 1199. Thesecond execution set 1199 outputs a third collection of data elementswhich are then input to a scalar data port of a third component 1106 inthe first execution set 1197. Since two different collections of dataelements are connected to different scalar ports in the first executionset 1197, there is no way of knowing how many parallel instances of thecomponents in the first execution set 1197 should be instantiated (sinceone instance of the components is generated for each data elementpresent at the boundary of the first execution set 1197). In someexamples, the user is notified of this conflict by displaying an errorindicator 1108 on, for example, the second component 1104.

Referring to FIG. 11B, another example of an illegal data processingconfiguration 1110 includes a data source 1112 feeding a collection ofdata elements to a scalar input port of a first component 1114 in afirst execution set 1116. A scalar output of the first component 1114provides its output, as a collection of data, to a collection port of asecond component 1118 outside of the first execution set 1116. Thesecond component 1118 provides a collection of data elements from acollection type output port to a scalar data port of a third component1120 in the first execution set 1116.

By passing a collection of data elements from the collection type outputport of the first component 1114 out of the first execution set 1116,processing the collection of data elements at the second component 1118,and then passing the processed collection of data elements back into thescalar port of the third component 1120, an “execution set loop” isdefined.

In general, execution set loops are illegal since they are detrimentalto execution ordering. For example, it is generally permissible to haveadditional flows going into an execution set or coming out of anexecution set since, for inputs the input data can be buffered prior toexecuting the execution set and for outputs the output data can begathered after the execution set completes execution. However, this isnot possible if an external component is required to run both before andafter the execution set.

In some examples, the user is notified of execution set loops bydisplaying an error indicator 1108 on one or more of the components.

In some examples, a data processing graph is considered to be illegal ifeach execution set entry point is not matched by at least onecorresponding execution set exit point. Alternatively, an execution setthat has an entry point but no corresponding exit point may be allowedas a user-defined execution set, even if it is not recognizedautomatically by the assignment algorithm. In those cases, the executionset may end (without providing any output data elements) after the mostdownstream component(s) finish execution. In some examples, a dataprocessing graph is considered to be illegal if each latch operation isnot matched by a corresponding unlatch operation. Alternatively, anunlatch operation can be inferred if none is explicitly specified, andonly indicated as illegal if the inferred unlatch operation would needto be in a different execution set from the latch operation. In someexamples, a data processing graph is considered to be illegal if a latchoperation and its corresponding unlatch operation are not both presentin the same execution set.

4 Visual Display of Nested Execution Sets

As introduced above, in at least some embodiments, the execution setsthat are discovered by execution of the assignment algorithm aredisplayed visually to the user, for example, using shaped regionsenveloping the components in the visual display. The visualrepresentations of the sets in general include an outline or otherindication of a spatial extent of each set within the visualrepresentation of the computation. In general, a particular visualrepresentation (e.g., on a user interface) may include multiple sets,and sets may be nested one within another. It has been found that thenature of the outlines or spatial extent can affect the utility of thevisual representation by making the membership of components into theircorresponding sets more or less clear.

Because the syntactic and/or semantic interpretation of the computationdepends on the assignment of components to the execution sets, clearvisual representation of the membership also provides clearrepresentation of the interpretation of the computation to the user. Byclearly visually representing the interpretation, the user can moreeasily detect unintended interpretation of the computation that wouldresult in corresponding unintended runtime computation using the programspecification. However, because the placement of the components may bebased on other considerations than their membership in execution sets,for example, to make clear the flow of data or control, the shapes ofthe regions enclosing the executions sets may be complex, resulting inpotentially complex, confusing, or distracting presentation of theexecution sets. Approaches below provide one or more ways to forming theregions for the execution sets that are clear and easily interpreted bythe user.

Referring to FIG. 15, in an embodiment of the approach introduced abovewith reference to FIG. 1, a user 1502 provides a graph-based programspecification 1510 via a graphical user interface 1522 of acompiler/interpreter 1520. This program specification includes anidentification of components of the program. The user interacts with avisual representation of the overall program, with each component itselfhaving a visual representation. For example the overall program isvisually representing on a two-dimensional frame, and the visualrepresentation of each component has a location within that frame, and ashape and/or spatial extent within that frame. As part of an inputprocessing 1528, information in the program specification characterizingthe components and/or the interconnections is used to identify theexecution sets 1526 of the program specification. These execution sets1526 are passed to a compilation 1532 to produce a runtime specification1530. The runtime specification is later used to process data accordingto the specification, and may includes an execution set control element1552, which coordinates the execution of the execution sets.Alternatively, the control of the execution sets is embedded in theruntime specification 1530 without an explicit execution set controlelement.

The same execution sets 1526 that are used to determine the runtimespecification 1530 are passed to an output processing element 1524 ofthe graphical user interface 1522. This output processing forms a visualrepresentation 1560 of the program specification for presentation to theuser 1502. This visual representation includes the visual representationof the execution sets in general includes a representation of thespatial regions and/or outlines of the spatial regions associated withthe execution sets.

Although described in the context of the computation system introducedabove, it should be understood that the approach to providing a visualrepresentation of potentially nested sets of interconnected componentsis applicable to other systems, particularly where the assignment ofcomponents to sets is based on a syntactic and/or semanticinterpretation (e.g., a “parse”) of the graph-based specification. Ingeneral, each component has a visual representation that itself has aspatial extent. In the system introduced above, some or all of thecomponents have rectangular spatial extent in the user interface,although other shapes may be used (e.g., circles, ovals, icon such asrepresentations of disks, etc.). In the discussion below, the terms“set” and “group” are generally used interchangeably, and “component”and” block” are generally used interchangeably—so the processing may beexpressed as forming the spatial regions associated with nested groupsof blocks. Also, each set is associated with a label such that thelabels are partially ordered where the partial ordering represents thenesting of the sets. As introduced above, the labels may take the form“/a/b/c” (i.e., a sequence of symbols, in this case initiated/rooted andseparated by “/”), and a first label envelopes (i.e., is greater than) asecond label if it forms a prefix of the second label. Therefore a setwith label “/a/b” envelopes a set with label “/a/b/c” and is disjointwith (i.e., unordered with respect to) a set with label “/a/d”.

The output processing 1524 uses the locations and spatialcharacteristics of the components provided by the user in the programspecification 1510, and the identified execution sets 1526 to form thevisual representations of the sets. Referring to FIG. 16, a simpleexample of a components 1621-1623 of a program specification are shownin their visual representation on a frame 1610. In this example, all thecomponents belong to the whole set, labeled “/0”, and components 1622and 1623 belong to the nested set labeled “/0/2”. The spatial region1630 associated with the “/0/2” label is shaded in this example, and theoutline 1631 of the region is presented in the visual representation. Itshould be noted that even in this simple example, because of thepositioning of the components, the “/0/2” set cannot be enveloped in arectangular region that also excludes component 1621, which is not inthe set. Therefore it should be clear that non-rectangular outlines ofthe spatial regions are generally necessary.

In this embodiment, the output processing to form the spatial regionshas a number of requirements, including:

-   -   Outlines must not envelop blocks that are not within the        corresponding groups.    -   Outlines must be separated from the blocks and from one another,        for example, by minimum distances.        In this embodiment, the output processing to form the spatial        regions has a number of goals, including:    -   Blocks, and which blocks are nested inside of which other        blocks, should be immediately obvious and immediately        recognizable using the user's natural perceptual shape and        hue/color recognition.    -   The user should not have to deduce grouping just from labeling        or interconnection of blocks.    -   Outlines should be smoothly curved, under the premise that sharp        corners and/or lots of fine detail on the outlines diminishes        the user's ability to easily recognize the grouping and also        distinguish between several groupings at a glance.    -   Outlines should not have lots of extraneous space inside them,        under the premise that such extra space detracts from and        obscures the information they are providing.

In an implementation that addresses these design goals, the process ofdetermining the outlines and spatial regions for the groups makes use ofthe following steps.

-   -   1. Accept the locations (and shapes if necessary) and group        labels of the non-overlapping blocks of the program        specification.    -   2. Compute initial outlines of the regions that correctly        satisfy the requirements of correctly enveloping blocks and        separating the outlines    -   3. Improve the outlines according to the goals.    -   4. Apply visual logic to the sequencing of hues/colors wrt child        and sibling blocks

As described in more detail below, in an embodiment, the step (2.) ofcomputing the initial outlines is performed in a number of steps:

-   -   a. Tessellate the frame    -   b. Identify intercepts of the outlines at the edges of the tiles        of the tessellation    -   c. Connect the intercepts to form the initial outlines.        The term “to tessellate” a space should be understood broadly to        mean to divide up (i.e., partition) the space into shapes (e.g.,        polygons) referred to as “tiles,” with one example of such a        dividing up making use of the same shapes, such as triangles, to        divide up the space. The result of “tessellating” a region is        referred to as a “tessellation.”

As described in more detail below, in an embodiment, the step (3.) ofimproving the outlines is performed in two steps:

-   -   a. Iteratively adjust the intercepts    -   b. Round “corners” of the outlines

Steps 2 a-c and 3 a are illustrated in FIGS. 17A-C for the exampleillustrated in FIG. 16. Referring to FIG. 17A a first step is toidentify a set of points 1710, 1711 and 1712 (only representative onesare labeled in the figure) on the blocks and frame of the visualrepresentation of the program specification. Points 1710 are located atthe vertices of the visual representations of the blocks. In thisexample in which each block is represented as a rectangle, four points1710 are at the corners of each block's rectangle. Points 1711 arelocated at the edge of the frame at horizontal and vertical projectionsof the closest points 1710 to the edge. For example, of all of thecorner points 1710, only those that lie on (or are within a particularthreshold of) a perimeter polygon connecting certain outer blockvertices and surrounding all of the blocks are selected for projectingthe points 1711 onto the frame. Such a perimeter polygon can becomputed, for example, one of multiple iterations of the tessellation ofstep 2 a. Points 1712 are located at the corners of the frame.

Continuing with step (2 a.) the points 1710-1712 are linked by edges1720 (only a representative one of which is labeled) to form a set ofmesh of triangles. In this embodiment, Fortune's procedure is used toselect the edges to form a Delaunay mesh of triangle on the points, asdescribed in Fortune, S., “A Sweepline Algorithm for Voronoi Diagrams,”Algorithmica, 2:153-174, 1987, which is incorporated herein byreference. However, it should be understood that other approaches toforming the triangle mesh can be used with similar overall effect.

Each edge 1720 of the triangular mesh has two end points 1710-1712. Eachend point is associated with a label. A point 1710, which is at a vertexof a block, is associated with a label of the block. Points 1711 and1712, which are on the edge of the frame are associated with a rootlabel, in this example, “/0”. Each edge that has different labelsassociated with its endpoints has one or more intercept points 1730 forthe outlines.

The number and locations of intercept points 1730 on an edge 1720, andthe locations of the intercept points along the edge, depends on thelabels of the end points. If the pattern of labels is “/a” and “/a/b”then there is a single intercept point 1730, which is associated withthe outline of the spatial region associated with the label “/a/b”. Inthis case, the intercept point is placed at the midpoint of the edge1720. Similarly, if the pattern of labels is “/a/b” and “/a/b/c”, thenthere is also only a single intercept point on the edge (i.e.,associated with the outline of the “/a/b/c” region).

If the pattern of labels is “/a” and “/a/b/c/d”, for example, then therewill be three intercept points 1730 on the edge, associated with specialregions labeled “/a/b/c/d”, “/a/b/c”, and “/a/b”. In some examples,these points are evenly spaced dividing the edge into four equal lengthsegments. In some examples, spacing between the intercept points 1730and between the intercept points 1730 and the mesh points 1710-1712 arenot necessarily equal, for example, providing a minimum separationbetween the mesh points and the intercept points, and uniform spacingbetween the intercept points.

If the pattern of labels has a common prefix and different suffixes,such as in “/a/b” and “/a/c” (i.e., a prefix “/a” and suffixes “/b” and“/c”) then the number of points depends on the lengths of the suffixes.In this example, two intercept points are needed: one for the boundaryof the “/a/b” region to the “/a” region, and one for the boundarybetween the “/a” region and the “/a/c” region. Therefore, the number ofpoints to be placed on an edge is the sum of the lengths of the suffixesof the two labels.

The example shown in FIG. 17A is relatively simple, with either nointercept points on edges where both end points are labeled “/0”, or oneintercept point on edges that have one end point labeled “/0” and onelabeled “/0/2”. In FIG. 17A, these intercept points are placed at themidpoints of the edges.

Referring to FIG. 17B, and continuing with step (2 b.) the interceptpoints that correspond to a same outline are joined by line segments1740 to form the closed boundary, which satisfies the requirements forthe spatial regions.

To later support rendering filled outlines, drawing outer ones firstbefore drawing inner ones (so that drawing the outer ones does not hidethe inner ones), it is useful to sort these outlines by increasingnesting depth. One way to determine whether an outline is nested withinanother outline is to begin at a point 1730 on the outline, and traversemesh edges 1720 reach an edge point 1711-1712 of the frame. Along thetraversed path, any other outline that is crossed an odd number of timesencloses the starting outline. Using this procedure repeatedly, thepartial ordering of the regions by geometric nesting (as opposed to thelabeling) is established.

It should be recognized that the procedure provided above does notnecessarily result in contiguous spatial regions, and such disconnectedregions may provide an improved visual representation as compared to aforced contiguous form. Referring to FIGS. 18A-B, two blocks 1821-1822are labeled “/0/a” and two blocks 1823-1824 are labeled “/0/b”.Following the procedure outlined above, the triangle mesh and interceptpoints are located as illustrated in FIG. 18A. The outlines of theregions are then formed as illustrated in FIG. 18B. Note that the blocks1821-1822 with label “/0/a” are found in two disjoint spatial regions1830-1831 of the visual representation, while the blocks 1823-1824 arefound within a single connected region 1840 of the visualrepresentation. Therefore, it should be clear that in some cases, theregion associated with a particular label (e.g., “/0/a”) may bedisconnected and formed of multiple constituent parts.

The part of the procedure described above does not generally result inoutlines that meet the desired criteria. Referring again to FIG. 17C,the step (3.) of improving the outlines is performed by first relocatingthe intercept points 1730 along the edges 1720 of the triangular mesh instep (3 a). In one implementation, this relocation is performed in aniterative “relaxation” approach. Intercept points are considered inturn, for example, in a predefined order, or in a random order. If thelength of the outline of that intercept point can be reduced by movingthe point along the edge, then it is moved subject to spacingconstraints. In some examples, the spacing constraints include a minimumdistance between and an end point 1710-1712 of an edge and the interceptpoint and a minimum distance between intercept points. In some examplesthese minimum distances are predetermined. In other examples, theseminimum distances are computed according to the length of the edge andthe number of intercept points on the edge. In some examples, theminimum distances are computed to provide a consistent minimum distancebetween end point on a block and the intercept points, for example, toyield a uniform visual appearance.

In FIG. 17C, two representative intercept points 1730 are illustratedbefore and after the improvement procedure, and the remaining interceptpoints are illustrated at their final locations.

Referring to FIGS. 17D-E, in some examples, the adjustment of theintercept points is constrained such that the intercept points 1730and/or the line segments joining the intercept points 1739 do notencroach into “bumper” regions around the components. For example, acircular bumper 1780 for an outline is positioned (at leastconceptually) at each vertex of a component. In FIG. 17D, an approach inwhich the intercept points are constrained to not enter the bumperregion is illustrated, even if the line segments joining the pointsencroach into the bumpers. As an alternative, not only are the interceptpoints constrained to not enter into the bumpers, the line segmentsjoining the intercept points are also constrained to not enter thebumpers. These approaches avoid the outline from approaching too closelyto the vertex in a direction that does not have an edge of a tile on it.The diameters of the bumpers are determined, for example, withincreasing diameter in the manner that the minimum spacing is determinedfor the intercepts on the edges of the tiles, for example, reducing thediameters from a default size when there is insufficient room betweencomponents for the default size. Referring to FIG. 17E, it should beunderstood that in other embodiments, the outlines ideally follow theboundaries of the bumpers, thereby both providing smoothness andachieving an optimality of path for the outlines.

Referring back to FIG. 16, the outline 1631 is obtained by processingthe outline 1740 of FIG. 17C by “rounding” the corners of the outline atthe intercept points. For example, for corners associated with a point1710 at a vertex of a block, a circular arc centered at that point isused to replace part of the outline. Other approaches to smoothing orrounding the outlines can be used, for example, using other approachesto introduce circular arcs, by using splines, etc. It should beunderstood that the approach to using the “bumpers” illustrated in FIG.17D followed by rounding corners may be considered to be oneapproximation of an optimal “tightening” of a contour subject theconstraints of minimum separation of contours from one another and fromthe components. This approximation provides a computationally efficientapproach that visually provides representations that may beindistinguishable in their quality as compared to optimal contours.

In some embodiments, the requirement that the blocks in the originalspecification are non-overlapping is relaxed by first identifyingnon-overlapping parts of the blocks, and then performing the approach asdescribed above. For example, as shown in FIG. 19, two overlappingrectangles, labeled “/0/a” and “0/b”, may be first replaced with twoirregular polygons with rectilinear sides (which can be consideredequivalently to be abutted rectangles. The outlines of the resultingregions are then computed using the irregular polygons, and then thesecomputed outlines are used in the rendering of the original blocks.

In alternatives of the approach described above, in addition to blocks,lines may be enveloped in spatial regions. For example, in addition topoints 1710 at vertices of blocks, additional points along links (e.g.,visually represented as lines) joining blocks can be used to cause linksbetween blocks with a same label to be with the spatial region for thatlabel. As another example, certain intercept points 1730 may beintroduced, for example at points along links at locations at symbolcharacterizing change in data handling (e.g., a start or an end of a“for all” section), thereby causing an outline to pass through thatsymbol.

A more complex example is illustrated in FIG. 20, which shows thetriangular mesh that was generated and used to form the resultingcontours for the nested spatial regions for the data processing graphprogram specification shown in FIG. 2B.

5 State Machine for Control Graphs

In the process of preparing a data processing graph for execution, thecompiler/interpreter 120 also generates a control graph in a controlgraph generation procedure. In some implementations, generating acontrol graph includes generating executable code for performing thetasks corresponding to individual components and code corresponding tothe various links among the components that determine flow of data andcontrol among those tasks. This includes transfer of data and controlamong the hierarchy of execution sets discovered by thecompiler/interpreter 120.

Part of generating such executable code includes generating, in somedata structure representations, a corresponding control graph for eachexecution set, including any enable/suppress execution sets. Any nestedexecution sets within an execution set are treated as a single componentrepresenting that nested execution set for purposes of generating acontrol graph. The ports of this representative component correspond toports of components within the nested execution set that are connectedto links that cross the boundary of the nested execution set. Thecompiler/interpreter 120 will then use this control graph to generatecontrol code. This generated control code effectively implements a statemachine that controls execution at runtime. In particular, onceexecution begins, this generated control code controls when a componentor a port transitions from one state to another of this state machine.

FIG. 12A shows an example of how the compiler/interpreter 120 combinesfirst and second component pairs 1202, 1204 of a root execution set intoa control graph 1206. In this example, the first component pair 1202includes first and second components 1208, 1210 connected by respectivecollection data ports 1212, 1214. The second component pair 1204includes and third and fourth components 1216, 1218 connected byrespective scalar data ports 1220, 1222.

The compiler/interpreter 120 creates a control graph by adding a begincomponent 1224 and a finish component 1226 and connecting components tothe begin and finish components 1224, 1226 as dictated by the topologyof the data processing graph. The begin and finish components do notperform any computing tasks, but are used by the compiler/interpreter120 to manage the control signals that will be used to begin executionof certain components and determine when all components in the executionset have finished execution.

To determine whether a particular component needs to be connected to abegin component 1224, the compiler/interpreter 120 inspects the inputsto that component to determine if it is not designated to beginexecuting based on an existing link to an upstream serial port, which,as described above, includes both control ports and scalar ports.

For example, if a component has no link to its control input port, thereis the possibility that it will never begin executing since there wouldnever be a control signal to tell it to start. On the other hand, evenif there were no control input, it is possible, depending on the type ofdata input that a component has, for arrival of data to triggerexecution of that component. For example, if a component has a scalarinput port, then even in the absence of a control signal at its controlinput port, that component will still begin execution as soon as it seesdata at its scalar input port. On the other hand, if a component onlyhas a collection data input, then this will not happen. If such acomponent does not have a control input or scalar data input to triggerexecution, it will need a connection to the begin component 1224.

In the context of FIG. 12A, the first component 1208 has neither acontrol input nor a scalar data input. Thus, there would be no way forthe first component 1208 to begin execution by itself. Therefore, thefirst component 1208 must be linked to the begin component 1224. Thethird component 1216 likewise has neither a control input nor a scalardata input. Therefore, the third component 1216 must also be linked tothe begin component 1224.

The fourth component 1218 has no control input. But it is connected toreceive a scalar data input from the third component 1216. Therefore, itwill begin execution upon receiving data through its input scalar port1222. Thus, the fourth component 1218 does not require a connection tothe begin component 1224.

The second component 1210 is configured to receive data from the firstcomponent 1208. However, this data is received at an input collectionport 1214 and not at an input scalar port. As a result, the secondcomponent 1210, like the first, must also be connected to the begincomponent 1224.

The compiler/interpreter 120 also needs to identify which of thecomponents will need to be connected to the finish component 1226.

In general, a component is connected to a finish component 1226 when itlacks either a control output link or a data output link (of any type).In the diagram on the left side of FIG. 12A, this condition is onlysatisfied by the second component 1210 and the fourth component 1218.Thus, as shown on the right side of FIG. 12A, only these two componentsare connected to the finish component 1226.

FIG. 12B is similar to FIG. 12A except that a control link existsbetween the first component 1208 and the third component 1216 on theleft side of the figure. Consistent with the rules, it is no longernecessary to connect the third component 1216 to the begin component1224 in the resulting alternative control graph 1206′.

The control graph effectively defines a distributed state machine inwhich the components and their serial ports transition from one state toanother in response to transitions occurring for upstream components andserial ports. In general, an upstream component will transition from onestate to another, causing its output serial ports to transition, whichcauses linked serial input ports of downstream components to transition,which causes those downstream components to transition, and so on. Oneexample of a specific type of state machine for achieving this behavioris described in greater detail below, with reference to state transitiondiagrams for components and their serial ports.

To provide control over the transitions of the state machine, thecompiler/interpreter 120 grafts additional control code to the code forperforming the task represented by a particular component. As usedherein, “grafting” means pre-pending, appending, or both pre-pending andappending control code. Control code that is pre-pended is referred toherein as “prologue” code, whereas control code that is appended isreferred to as “epilogue” code. Prologue code for a component isexecuted before the component executes its task. Epilogue code for acomponent is executed after the component 610A has completed executingits task.

The grafted control code inspects stored state information, such as thevalue of an accumulator (e.g., a counter counting down to a valueindicating that inputs are ready for invoking a component) or the stateof a flag (e.g., a flag set to a value indicating that a component hasbeen suppressed), to determine whether or not to cause one or moredownstream components to execute their respective tasks.

In one embodiment, prologue code monitors the states of upstream outputserial ports and updates the states of the input serial ports of thecomponent and the state of the component, while the epilogue codeupdates the component's output serial ports after the componentcompletes carrying out its task.

In another embodiment, instead of the prologue code of a downstreamcomponent monitoring upstream output serial ports, the epilogue code ofan upstream component updates the collective state of downstream inputserial ports and monitors that collective state to trigger execution ofthe prologue code of the downstream component at an appropriate time,such as when a counter initialized to the number of input serial portsreaches zero. Alternatively, instead of a counter counting down from anumber of input ports (or counting up to a number of input ports),another form of accumulator can be used to store the state informationfor triggering a component, such as a bitmap that stores bitsrepresenting states of different ports of different components.

As a result of this grafted control code, completion of tasksautomatically leads to automatic execution of other tasks in a mannerconsistent with the data control dependencies that are represented bythe control graph and in a manner that permits concurrent operation ofmultiple components and the use of conditional control logic to control,based on the occurrence of a collection of one or more upstream logicalstates, when execution of particular components begins and ends.

FIGS. 13A and 13B show state transition diagrams for an example statemachine that could be used for components (state transition diagram 1300of FIG. 13A) and for their serial ports (state transition diagram 1310of FIG. 13B). The state transition diagrams are similar except thatsince the active state 1304 is associated with ongoing execution, andsince only components and not ports carry out execution, only acomponent can be in the active state 1304.

All of the possible states of both state transition diagrams will bedescribed, as well as the conditions necessary to follow each transitionbetween the states, referring as needed to FIGS. 13A and 13B. All of theinput and output ports referred to in this description of the statetransition diagrams are serial ports, since the components in thecontrol graph only need to link serial ports (and not collection ports).A particular component in a control graph can be in one of the fourlogical states of the state transition diagram 1300. The first state isthe pending state 1302. This is the state a component starts in when theexecution set associated with the control graph begins execution. Acomponent remains in the pending state 1302 if any input port of thecomponent is in the pending state 1312. If a component happens to haveno input ports, it starts in the pending state 1302 but is immediatelyeligible to transition out of the pending state 1302.

From the pending state 1302, the component can transition into eitherthe active state 1304 or the suppressed state 1306.

A component transitions into the active state 1304 if none if its inputports is in the pending state 1312 and not all of its input ports are inthe suppressed state 1316 (i.e., at least one input port is in thecomplete state 1314). Ports are “required” by default, but may be markedas “optional”. An optional port can be left unconnected to another portwithout causing an error (though there may be a warning). Any optionalport left unconnected is automatically in the complete state 1314. Acomponent remains in the active state 1304 as long as it is stillexecuting its task. While a component is in the active state 1304, itsoutput ports can transition, either at different times or together, fromthe pending state 1312 to either the complete state 1314 or thesuppressed state 1316. Upon completing execution of its task, thecomponent transitions from the active state 1304 into the complete state1308.

A component transitions into the complete state 1308 if the component'stask has finished executing, and all of its output ports are “resolved,”i.e., no longer pending.

A component is in the suppressed state 1306 if the component's prologuehas triggered a transition to the suppressed state 1306, either due tocustom control logic, due to all of its input ports being suppressed,due to suppression of at least one of its required input ports, or dueto an unhandled error in the component. All of the component's outputports also resolve to the suppressed state 1316 to propagate thesuppression downstream.

For ports, the state transition rules depend on whether the port is aninput port or an output port.

The initial state for a port is the pending state 1312. An input portgenerally follows the state of an upstream output port to which it islinked. Thus, when an upstream output port transitions, the input portlinked to that output port in the control graph transitions into thesame state. An output port remains pending until the component, duringits active state, determines what state the output port should resolveto.

As noted above, input ports follow upstream output ports to which theyare linked. Thus, for an input port linked to a single upstream outputport, that input port transitions into the complete state 1314 when theupstream output port to which it is linked transitions into the completestate 1314. If an input port is linked to multiple upstream output portsthrough multiple links, then the input port transitions into thecomplete state 1314 after at least one of its upstream output portstransitions to the complete state 1314. Otherwise, if all upstreamoutput ports transition to the suppressed state 1316, then the inputport transitions to the suppressed state 1316. Some embodiments useother logic different from this default “OR logic” to determine whetherto transition an input port to the complete state 1314 or suppressedstate 1316 (e.g., “AND logic” where an input port transitions to thecomplete state 1314 only if all upstream output ports are in thecomplete state 1314). If a component's input data port resolves to thecomplete state 1314, a data element is ready for that component toprocess. If a component's output data port resolves to the completestate 1314, a data element is ready to send downstream from thatcomponent.

Consistent with the rule that input ports follow the state of upstreamoutput ports to which they are linked, an input port resolves to thesuppressed state 1316 when an upstream output port to which it is linkedresolves to the suppressed state 1316. An output port resolves to thesuppressed state 1316 either because an active component computed aresult that determined the output port should be suppressed, or toenable suppression from an upstream suppressed component to propagatedownstream, or if there was an unhandled error in the component. In someembodiments, it is possible for the compiler to optimize execution bysuppressing a tree of downstream components having a root at asuppressed component without having to have suppression propagatedownstream on a component-by-component basis.

In other embodiments, any of a variety of alternative state machinescould be used, in which links between collection ports could also beincluded in the control graph. In some such embodiments, a statetransition diagram for collection ports could include an active state inaddition to the pending, complete, and suppressed states, such as in thestate transition diagram 1300 for components. A collection port is inthe active state when it is producing (as an output port) data, orconsuming (as an input port) data. For an input collection port, forexample, the active state could be triggered when the first data elementis produced upstream, as soon as it is determined that not all inputports will be suppressed. In some embodiments, there is no suppressedstate for collection ports. The transition rules followed by componentsin a control graph that includes state transitions for collection portsmay handle the active state for an input collection port in the samemanner that the complete state was handled for an input scalar port orcontrol port.

6 Computing Platform

Referring back to FIG. 1, instances of components of the data processinggraph are spawned as tasks in the context of executing a data processinggraph and are generally executed in multiple of the computing nodes 152of the computing platform 150. As discussed in more detail below, thecontroller 140 provides supervisory control aspects of the schedulingand locus of execution of those tasks in order to achieve performancegoals for the system, for example, related to allocation of computationload, reduction in communication or input/output overhead, and use ofmemory resources.

Generally, after translation by the compiler/interpreter 120, theoverall computation is expressed as a task-based specification 130 interms of procedures of a target language that can be executed by thecomputing platform 150. These procedures make use of primitives, such as“spawn” and “wait” and may include within them or call the workprocedures specified by a programmer for components in the high-level(e.g., graph-based) program specification 110.

In many instances, each instance of a component is implemented as atask, with some tasks implementing a single instance of a singlecomponent, some tasks implementing a single instance of multiplecomponents of an execution set, and some tasks implementing successiveinstances of a component. The particular mapping from components andtheir instances depends on the particular design of thecompiler/interpreter, such that the resulting execution remainsconsistent with the semantic definition of the computation.

Generally, tasks in the runtime environment are arranged hierarchically,for example, with one top-level task spawning multiple tasks, forexample, one for each of the top-level components of the data processinggraph. Similarly, computation of an execution set may have one task forprocessing an entire collection, with multiple (i.e., many) sub-taskseach being used to process an element of the collection.

In the runtime environment, each task that has been spawned may be inone of a set of possible states. When first spawned, a task is in aSpawned state prior to being initially executed. When executing, it isin an Executing state. From time to time, the task may be in a Suspendedstate. For example, in certain implementations, a scheduler may put atask into a Suspended state when it has exceeded quantum of processorutilization, is waiting for a resource, etc. In some implementations,execution of tasks is not preempted, and a task must relinquish control.There are three Suspended substates: Runnable, Blocked, and Done. A taskis Runnable, for example, if it relinquished control before it hadcompleted its computation. A task is Done when it has completed itsprocessing, for example, prior to the parent task retrieving a returnvalue of that task. A task is Blocked if it is waiting for an eventexternal to that task, for example, completion of another task (e.g,because it has used the “wait for” primitive), or availability of a datarecord (e.g., blocking one execution of an in.read( ) or out.write( )function).

Referring again to FIG. 1, each computing node 152 has one or moreprocessing engines 154. In at least some implementations, eachprocessing engine is associated with a single operating system processexecuting on the computing node 150. Depending on the characteristics ofthe computing node, it may be efficient to execute multiple processingengines on a single computing node. For example, the computing node maybe a server computer with multiple separate processors, or the servercomputer may have a single processor that has multiple processor cores,or there may be a combination of multiple processors with multiplecores. In any case, executing multiple processing engines may be moreefficient than using only a single processing engine on a computing node152.

One example of a processing engine is hosted in the context of a virtualmachine. One type of virtual machine is a Java Virtual Machine (JVM),which provides an environment within which tasks specified in compiledform as Java Bytecode may be executed. But other forms of processingengines, which may or may not use a virtual machine architecture can beused.

Referring to FIG. 14, each of the processing engines 154 of a computingnode 152 has one or more runners 1450. Each runner 1450 uses one or moreprocesses or process threads to execute runnable tasks. In someimplementations, each runner has an associated process thread, althoughsuch an association of runners with threads is not necessary. At anytime, each runner is executing at most one runnable tasks of thecomputation. Each runner has a separate runnable queue 1466. Eachrunnable task of the computation is in one runnable queue 1466 of arunner 1450 of the system. Each runner 1450 has a scheduler/interpreter1460, which monitors a currently running task, and when that taskchanges state to Done, Blocked, or Suspended, selects another task fromthe runnable queue 1466 and executes it. Tasks are associated withrunners, and a runner's tasks that are not runnable are maintainedoutside the runnable queue 1466, for example as illustrated in a blockedand done queue 1468.

Runners 1450 may be created when the processing engines 154 areinitialized, for example, creating a preconfigured number of runners perengine. As discussed below, in some implementations, runners may beadded or remove from processing engines, and processing enginesthemselves may be added and removed from the computing platform 150,even during execution of a data processing graph. For an initialdescription below, however, we assume that the number of processingengines and the number of runners within each processing engine remainconstant.

As an example, processing for a data processing graph begins withexecution of the Main procedure in a top-level task. For example, thetask-based controller 140 instructs one of the computing nodescommunicating with a monitor 1452 of one of the processing engines 1450to begin execution of the Main procedure. In this example, the monitor1452 places a task for executing the Main procedure in the runnablequeue 1466 of one of the processing engines. In this example, the runneris idle (i.e., there are no other tasks running at the time, and noother runnable tasks in the runnable queue), so thescheduler/interpreter 1460 of that runner retrieves the task from therunnable queue and begins execution of the task. When the procedure isexpressed in a language that needs to be interpreted, thescheduler/interpreter 1460 interprets successive statements of theprocedure.

In this example, the first statement of the Main procedure creates(i.e., allocates memory for) link buffers 1470 for links supporting theflow of unordered collections, which in this example includes unorderedunbounded buffers buffer1, buffer2, and buffer3. Various approaches areused for creating this type of inter-component link, and for managingassociated computing resources for these links (including link buffers1470), which include any link whose upstream port is a collection port.In some examples, the link buffers 1470 include buffers for outputcollection ports representing the source of a collection and separatebuffers for input collection ports representing the destination of acollection. These buffers may be allocated at runtime just beforeprocessing on the collection begins, and deallocated (i.e., freeing thememory used for the buffer) just after processing on the collectionends. In this example, these link buffers 1470 are allocated in a memoryof the processing engine 154 in which the runner of the task isexecuting. In general, the memory in which the buffers are created arein semiconductor random access memory (RAM), although in someimplementations, other storage devices such as disks may be used tostore at least some of the buffer data. Note that in other approaches,the buffer may be local to the runner itself. In practice, if theprocessing engine 154 is implemented as an operating system process, thebuffers are created as memory regions in the address space of thatprocess. Therefore, direct hardware address based access to the buffersis limited to instructions that execute within that process. Note thatin such an approach, at least some synchronization and access control tothe buffers, for example, using locks or semaphores, may be needed ifmultiple runners will be able to read or write to the buffers. Inapproaches in which each runner is implemented as a single thread withinan operating system process, the buffers may be associated with aparticular runner, and all access may be restricted to that runner,thereby avoiding potential contention from multiple threads. In thediscussion below, we assume that the buffers are accessible from anyrunner in the processing engine, and that suitable access control isimplemented to allow such shared access.

The next steps of the Main process involve a spawn or forall primitivethat is invoked by the Main process. In general, at least by default,spawning of a child task or tasks causes those tasks to be initiallyformed in the same runner as the parent. For example, the spawnWork_Read_External_Data task is spawned on the same runner. To theextent that the task is accessing external data, the task may make useof an I/O interface 1464 to that external data. For example, thatinterface may consist of an open connection to an external database, anendpoint of a network data connection etc. Such I/O interfaces may bebound to the particular runner and therefore the task using thatinterface may be required to access the interface only from that runner,as is discussed further below in the context of potential migration oftasks between runners. In this example we assume that the task fillsbuffer1 in a manner that is reasonably metered and does not “overwhelm”the system, for example, by causing buffer1 to grow beyond the capacityof the processing engine. Approaches to aspects of control, for example,to avoid congestion or exhaustion of resources are also discussed below.

Concurrent with execution of the Work_Read_External_Data task, theforall Work_A causes tasks to be spawned for each of the records thatare read from buffer1. In particular, the “forall” primitive causesmultiple instances of a task identified by an argument of the primitiveto be executed, where the number of instances is determined typically bythe number of data elements received at runtime, and where the locationat which they are executed and the order in which they are invoked canbe left unrestricted by the compiler for later determination at runtime.As discussed above, by default these tasks are also created on the samerunner 1450, and again absent other controls, are spawned as fast asdata is available from buffer1. Tasks for Work_B, andWork_Write_External_Data are similarly created on the same runner.

Note that the task-based specification makes use of “forall” primitiveswithout explicitly specifying how the runtime controller will implementthe distribution of the tasks to cause all the data to be processed. Asdiscussed above, one approach that may be used by the runtime controlleris to spawn separate tasks on the same computing node, and then relyingon migration features to cause the tasks to execute on separate nodesthereby balancing load. Other approaches may be used in which a “forall”primitive causes tasks to be executed directly on multiple nodes. In thecase of a cursor defining an index-based subset of rows of a table ofthe in-memory database, an implementation of a cursor forall primitivemay cause the cursor to be split into parts each associated with recordsstored on different nodes, and tasks are spawned for the separate partsof the cursor on the different nodes thereby causing locality of theprocessing and the data storage. But it should be understood that a widerange of approaches may be implemented in one or more embodiments of aruntime controller and distributed computing platform to execute the“forall” primitives used in the task-based specification 130 that is theoutput of the compiler 120. In some examples, the selection of approachmay be dependent on runtime decisions, for example, based on number ofrecords, distribution of data over computing nodes, load on the nodes,etc. In any case, the approach used to implement the “forall” primitivesare not necessarily known to the developer of the data processing graphor to the designer of the compiler.

A feature of the system is that tasks may be transferred between runnersafter they are created. Very generally, one way such transfer of tasksis implemented by a “stealing” or “pull” mechanism in which a runnerthat is idle, or at least lightly loaded, causes tasks from anotherrunner to be transferred to it. Although a variety of criteria may beused, a number of runnable tasks in a runner's runnable queue 1466 maydetermine if that runner should seek tasks to steal from other runnersbased on a local criterion such as whether fewer than a threshold numberof tasks is in its runnable queue. In some implementations a more globaldecision process may be used to rebalance the task queues on multiplerunners, but the overall effect is similar.

In at least some embodiments, stealing of a task from one runner toanother does not necessarily involve transferring all the data for thattask. For example, only data accessible in a current execution “frame”(e.g., the data for the local and global variables accessible from thecurrent program scope, for example, a current subroutine call) arepackaged along with a reference back to the tasks “home” runner. Thisdata is sufficient to make a runnable copy of the task at thedestination runner of the migration and an entry in the destinationrunnable queue is ready for execution in that runner.

When a migrated runner completes execution, or exhausts the datatransferred to the runner by returning from the program scope for whichthe local variables were available, the task is transferred back to thehome runner, where the data for the task is merged and the task is onceagain made runnable at its home runner.

Note that during transfer of a task within a single processing engine,the communication between runners may be through local memory (i.e.,avoiding disk or network communication) thereby consuming relatively fewresources. In implementations that permit stealing and migration betweenprocessing engines, while in transit from one runner to another the taskconsumes relatively few resources, for example, primarily consumingcommunication resources between processing engines rather thancomputation resources. Furthermore, the latency of such communication isrelatively insignificant because the home and destination runners arepresumed to be busy computing during the transfer, the home runnerbecause its runnable queue was heavily populated and therefore unlikelyto empty and the destination runner because the stealing is done inanticipation of the runnable queue at the destination being emptied.

In the example of execution for the tasks associated with thecomputations in a data processing graph, the task stealing mechanismdistributed the load for the computation across the runners of one ormore processing engines. Note however, that certain data access islimited to a particular runner (or possibly to a particular processingengine). For example, as outlined above, the data for buffer2 may beaccessible by a single runner (or possibly a group of runners), and yeta Work_A task, which may need to write to buffer2 may have been stolenby a runner that is not able to write to buffer2. In such cases when atask needs to take an action that must be executed at a different runnerthan where that task is currently executing, the task is migrated to asuitable runner in a “migration” or “push” manner.

In at least some examples, the computation platform 150 supports aglobal data storage for a set of (key,value) pairs for global variables.This data storage may be distributed across memory (e.g., RAM, or disk)on multiple of the computing nodes (or processing engines). The namespace of keys is global in the sense that a specification of a key hasthe same meaning at all computing nodes 152 and their runners 1450. Thevalues for these variables persist while tasks are instantiated,execute, and terminate, thereby providing a way of passing informationbetween tasks without requiring that such information is passed from onetask to another via a common parent task. As discussed below access tovalues according to keys is controlled so that the use and updating ofthe values does not cause conflicts among tasks. In some examples, tasksgain exclusive access to a particular (key,value) pair for some or allof their execution.

In general, storage for the (key,value) pairs is distributed, and anyparticular (key,value) pair is associated with a particular computingnode 152. For example, the (key,value) pair is stored in a distributedtable storage 1480 at that computing node. In some implementations, thespawn primitive permits specification of a key and a mapping of theassociated variable into a local variable of the tasks. When a key isspecified, the task that is spawns gains exclusive access to the key forthe duration of its execution. Prior to execution beginning, the valueis passed from the storage into the local context of the task, and afterexecution completes, the value in the local context is passed back tothe global storage. If a spawn primitive specifies a key that is in useby another executing task, this newly spawned task is blocked until itcan gain exclusive access to the key. In some implementations, eachcomputing node can determined the home node for a particular key, andwhen a task is requested to be spawned, that request is handled by thecomputing node at which the (key,value) pair is resident, and theexecution of the task will initially begin at that node. In alternativeembodiments, other approaches for gaining similar exclusive access tosuch global shared (key, value) pairs does not necessarily involveinitiating tasks in the same location as the storage, for example, bycommunicating requests for exclusive access and later communicatingreleases of the exclusive access with the updated value of for the key.Tasks can create new (key, value) pairs, which by default are stored onthe node at which the task is running when new (key, value) pair iscreated.

One use of global state variables is for aggregation during execution ofa function of successive records of a collection. For example, ratherthan the value being single item, the global storage maintains a windowof values that are assigned to the key. Therefore, in the programmingmodel, a value can be added to the history maintained in associationwith the key, and a function of the previously added values can beprovided. The window of values may be defined according to a number ofitems (i.e., the last 100 items), by a time window (i.e., the itemsadded in the last 10 minutes, for example, defined by the times thevalues were added or by explicit time stamps provided with each value asit is added). Note that the programming model does not require explicitdeletion of old values that fall outside the window, with the definitionof the window allowing implementations to perform such deletionautomatically. The programming model includes a number of primitives forcreating such window based keyed global variables (e.g., defining thenature and extent of the window), adding values to the key, andcomputing functions (e.g., maximum, average, number of distinct values,etc.) of the window of values. Some primitives combine the addition of anew value for the key and returning of the function of the window (e.g.,add the new value to the key and return the average of the last 100values added).

In at least some examples, the global storage also includes sharedrecord-oriented data that is accessed via identifiers referred to ashandles. For example, a handle may identify a source or a sink of datarecords, or as another example, a handle may identify a particularrecord in a data set. Generally, the handles are typed in that a handlepoint provides a way of accessing data and also provides a definition ofthe structure of the data being accessed. For example, a handle may haveassociated with it the field (column) structure of a data record.

In at least some examples, the global storage (e.g., in memory of thecomputing nodes) includes a table storage for one or more tables of rowsof typed data, with the tables or particular records of tables againbeing accessed via identifiers referred to as handles. A table's rowtype may be a hierarchical record type, with vectors, vectors ofrecords, etc. In some examples, a table may have one or more indicesthat provide hash- or B-tree (ordered) access to rows, and a cursor canbe created from a table, an index, or to an index and key value(s). Rowsmay be inserted, updated, or deleted individually. In order to supporttransaction processing, a task may lock one or multiple rows of one ormore tables, for example, for read or update access during processingfor a component of the data processing graph. Tables can be treated ascollections for data parallel operations, for example, as sources ordestinations of data in a data processing graph. In general, the tablesare indexed, and a subset of rows of a table may selected based on theindex yielding a cursor, and that cursor is then used to provide theselected rows as a data source. In some examples, further primitives areavailable to a task for actions such as splitting a cursor andestimating a number of records associated with a handle. When a cursoris provided as a source of data for an execution set, the cursor may besplit into parts each providing some of the rows of the table to acorresponding instance of the execution set, thereby providingparallelism and with appropriate splitting of the cursor enablingexecution on nodes at which the rows are stored. A data table may alsobe accessed by a task implementing a transaction such that modificationsof the data table are maintained so as not to be visible outside a taskuntil those modifications are committed explicitly by a task. In someexamples, such transaction support may be implemented by locking one ormore rows of a table, while in other examples, more complex approachesinvolving multiple versions of rows, may be implemented to providegreater potential concurrency than may be provided solely using locks.

Files, data flows, and in memory tables are all examples of what arereferred to as collections. A reader task reads records from acollection, and a writer task writes records to a collection. Some tasksare both readers and writers.

As introduced above, flows representing collections may be implementedin the runtime system using in-memory buffers. Alternatively, any formof storage can be used in various implementations, including tableswithin a database, or a distributed storage system. In someimplementations, an in-memory distributed database is used. In someimplementations, the compiler implements such flows using in-memorytables in a manner that is not necessarily exposed to the developer ofthe data processing graph. For example, the compiler may cause anupstream component to populate rows of a table, and a downstreamcomponent to read previously populated rows, thereby implementing anunordered data flow. The runtime controller may invoke multipleinstances of a task to corresponding to an execution set to process thedriving data elements from an upstream collection port by retrieving thedata elements from the storage in a different order than they werereceived into the storage, and in a manner that prevents certain formsof blocking. For example, the instances of the task can be invokedwithout blocking invocation of any of the instances by any particularother instance (i.e., until after any particular other instancecompletes processing one or more data elements).

In general, a record in a collection may have a handle prior to the datain that record being first written. For example, a table may be set upas the destination of an indexed set of records, and individual recordsmay have handles even before the data for those records are written.

7 Implementations

The approach described above can be implemented, for example, using aprogrammable computing system executing suitable software instructionsor it can be implemented in suitable hardware such as afield-programmable gate array (FPGA) or in some hybrid form. Forexample, in a programmed approach the software may include procedures inone or more computer programs that execute on one or more programmed orprogrammable computing system (which may be of various architecturessuch as distributed, client/server, or grid) each including at least oneprocessor, at least one data storage system (including volatile and/ornon-volatile memory and/or storage elements), at least one userinterface (for receiving input using at least one input device or port,and for providing output using at least one output device or port). Thesoftware may include one or more modules of a larger program, forexample, that provides services related to the design, configuration,and execution of data processing graphs. The modules of the program(e.g., components of a data processing graph) can be implemented as datastructures or other organized data conforming to a data model stored ina data repository.

The software may be stored in non-transitory form, such as beingembodied in a volatile or non-volatile storage medium, or any othernon-transitory medium, using a physical property of the medium (e.g.,surface pits and lands, magnetic domains, or electrical charge) for aperiod of time (e.g., the time between refresh periods of a dynamicmemory device such as a dynamic RAM). In preparation for loading theinstructions, the software may be provided on a tangible, non-transitorymedium, such as a CD-ROM or other computer-readable medium (e.g.,readable by a general or special purpose computing system or device), ormay be delivered (e.g., encoded in a propagated signal) over acommunication medium of a network to a tangible, non-transitory mediumof a computing system where it is executed. Some or all of theprocessing may be performed on a special purpose computer, or usingspecial-purpose hardware, such as coprocessors or field-programmablegate arrays (FPGAs) or dedicated, application-specific integratedcircuits (ASICs). The processing may be implemented in a distributedmanner in which different parts of the computation specified by thesoftware are performed by different computing elements. Each suchcomputer program is preferably stored on or downloaded to acomputer-readable storage medium (e.g., solid state memory or media, ormagnetic or optical media) of a storage device accessible by a generalor special purpose programmable computer, for configuring and operatingthe computer when the storage device medium is read by the computer toperform the processing described herein. The inventive system may alsobe considered to be implemented as a tangible, non-transitory medium,configured with a computer program, where the medium so configuredcauses a computer to operate in a specific and predefined manner toperform one or more of the processing steps described herein.

A number of embodiments of the invention have been described.Nevertheless, it is to be understood that the foregoing description isintended to illustrate and not to limit the scope of the invention,which is defined by the scope of the following claims. Accordingly,other embodiments are also within the scope of the following claims. Forexample, various modifications may be made without departing from thescope of the invention. Additionally, some of the steps described abovemay be order independent, and thus can be performed in an orderdifferent from that described.

What is claimed is:
 1. A method for graph-based computation, the methodincluding: accepting specification information for the graph-basedcomputation, the specification information including a plurality ofgraph elements, and providing a visual representation of thespecification information to a user; determining, on a first computationsystem, a visual representation of a plurality of groups of the graphelements based on the accepted specification information, includingdetermining a spatial extent of a spatial region for at least a firstgroup of the plurality of groups based at least in part on a spatialextent of each of a plurality of graph elements; and providing a visualrepresentation of spatial regions for the plurality of groups to theuser in conjunction with the visual representation of the specificationinformation, the visual representation of each spatial region includingvisual representations of at least some of the graph elements in thegroup corresponding to that spatial region.
 2. The method of claim 1wherein a visual representation of the spatial region for the firstgroup is contained within a visual representation of a spatial regionfor a second group of the plurality of groups, according to a nesting ofthe first group of graph elements within the second group of graphelements, where (1) the first group of graph elements is a subset offewer than all graph elements in the first group of graph elements, and(2) each graph element in the first group of graph elements is directlyconnected at least one other graph element in the first group of graphelements within the graph-based computation.
 3. The method of claim 1wherein determining the visual representation of the plurality of groupsincludes processing the accepted specification information to form theplurality of groups.
 4. The method of claim 3 further including causingan execution of graph-based computation on a second computation systemto be consistent with the formed plurality of groups.
 5. The method ofclaim 4 wherein causing execution of the graph-based to be consistentwith the formed groups includes forming an executable representation ofthe graph-based computation from the specification information and theformed groups.
 6. The method of claim 3 further including forming aruntime specification of the graph-based computation according to theformed plurality of groups, for controlling an execution of graph-basedcomputation on a second computation system.
 7. The method of claim 1wherein the specification information for the graph-based computationincludes a specification of the plurality of graph elements, thespecification of each graph element including a location of a visualrepresentation of the graph element in a visual representation of thegraph-based computation.
 8. The method of claim 1 wherein determiningthe visual representation of plurality of groups of the graph elementsincludes: forming a first characterization of a candidate set ofoutlines enclosing the spatial regions for the groups; and determining asecond characterization of a final set of outlines enclosing the spatialregions for the groups from the first characterization.
 9. The method ofclaim 8 wherein forming the first characterization includes forming atessellation of at least a part of the visual representation surroundingthe graph elements.
 10. The method of claim 9 wherein forming the firstcharacterization includes identifying intersections of edges of tiles ofthe tessellation and the set of outlines.
 11. The method of claim 10wherein determining the second characterization includes modifying theintersections.
 12. The method of claim 11 wherein modifying theintersections includes constraining the modified intersections accordingto separation limits between outlines or between outlines and graphelements.
 13. The method of claim 11 wherein determining the secondcharacterization further includes smoothing and outline formed byjoining the intersections.
 14. The method of claim 8 wherein the graphelements form a partially ordered set, and forming the firstcharacterization includes determining a number of outlines separatingpairs of graph elements according to the partial ordering.
 15. Themethod of claim 14 wherein forming the first characterization includesdetermining intersections of lines between visual representations ofgraph elements and the set of outlines according to the number ofoutlines separating the graph elements.
 16. The method of claim 8wherein determining the second characterization of a final set ofoutlines includes reducing a length of each of the candidate set ofoutlines to form the final set of outlines.
 17. The method of claim 16wherein reducing the length is constrained by separation limits betweenoutlines or between outlines and visual representations of graphelements.
 18. The method of claim 8 wherein at least some spatial regionfor a group of graph elements includes a disconnected spatial region.19. The method of claim 1 wherein each of the graph elements in theplurality of graph elements includes nodes in a graph that includesnodes interconnected by links.
 20. The method of claim 19 wherein eachof one or more the graph elements in the plurality of graph elementsrepresents a computation step within the graph-based computation. 21.The method of claim 1 wherein the visual representation of each spatialregion includes visual representations of at least some of the graphelements in the group corresponding to that spatial region.
 22. Themethod of claim 1 wherein the spatial extent of the spatial region forthe first group is specified by an outline enclosing the spatial regionfor the first group.
 23. Software stored in a non-transitory form on acomputer-readable medium, for graph-based computation, the softwareincluding instructions for causing a computation system to: acceptspecification information for the graph-based computation, thespecification information including a plurality of graph elements, andprovide a visual representation of the specification information to auser; determine a visual representation of a plurality of groups of thegraph elements based on the accepted specification information,including determining a spatial extent of a spatial region for at leasta first group of the plurality of groups based at least in part on aspatial extent of each of a plurality of graph elements; and provide avisual representation of spatial regions for the plurality of groups tothe user in conjunction with the visual representation of thespecification information, the visual representation of each spatialregion including visual representations of at least some of the graphelements in the group corresponding to that spatial region.
 24. Acomputation system for graph-based computation, the computation systemincluding: an input device or port configured to accept specificationinformation for the graph-based computation, the specificationinformation including a plurality of graph elements, an output device orport configured to provide a visual representation of the specificationinformation to a user; and at least one processor configured todetermine a visual representation of a plurality of groups of the graphelements based on the accepted specification information, includingdetermining a spatial extent of a spatial region for at least a firstgroup of the plurality of groups based at least in part on a spatialextent of each of a plurality of graph elements; wherein a visualrepresentation of spatial regions for the plurality of groups isprovided to the user in conjunction with the visual representation ofthe specification information, the visual representation of each spatialregion including visual representations of at least some of the graphelements in the group corresponding to that spatial region.